Source files format¶

EduMIPS64 tries to follow the conventions used in other MIPS64 and DLX simulators, so that old time users will not be confused by its syntax.

There are two sections in a source file, the data section and the code section, introduced respectively by the .data and the .code directives. In the following listing you can see a very basic EduMIPS64 program:

; This is a comment
        .data
label:  .word   15     ; This is an inline comment

        .code
        daddi   r1, r0, 0
        syscall 0

To distinguish the various parts of each source code line, any combination of spaces and tabs can be used, as the parser ignores multiple spaces and only detects whitespaces to separate tokens.

Comments can be specified using the “;” character, everything that follows that character will be ignored. So a comment can be used “inline” (after the directive) or on a row by itself.

Labels can be used in the code to reference a memory cell or an instruction. They are case insensitive. Only a label for each source code line can be used. The label can be specified one or more rows above the effective data declaration or instruction, provided that there’s nothing, except for comments and empty lines, between the label and the declaration.

Multiple labels can refer to the same address. This is useful when two or more labels are placed on consecutive lines before an instruction, for example:

        .code
        daddi   r1, r0, 1
loop_end:
loop_begin:
        daddi   r1, r1, -1

In this case, both loop_end and loop_begin will point to the same instruction. Each label must still have a unique name; using the same label name twice will result in a parser error.

Memory limits¶

EduMIPS64 has a fixed memory size for both data (the .data section, capped at 640 kB – i.e., 80000 64-bit values) and instructions (the .code section, capped at 128 kB – i.e., 32000 instructions, each occupying 32 bits).

These limits are hardcoded in the simulator.

The .data section¶

The data section contains commands that specify how the memory must be filled before program execution starts. The general form of a .data command is:

[label:] .datatype value1 [, value2 [, ...]]

EduMIPS64 supports different data types, that are described in the following table.

Type

Directive

Bits required

Byte

.byte

8

Half word

.word16

16

Word

.word32

32

Double Word

.word or .word64

64

Type	Directive	Bits required
Byte	.byte	8
Half word	.word16	16
Word	.word32	32
Double Word	.word or .word64	64

Please note that a double word can be introduced either by the .word directive or by the .word64 directive.

All the data types are interpreted as signed. This means that integer literals in the .data section must be between -2^(n-1) and 2^(n-1) - 1 (inclusive).

There is a big difference between declaring a list of data elements using a single directive or by using multiple directives of the same type. EduMIPS64 starts writing from the next 64-bit double word as soon as it finds a datatype identifier, so the first .byte statement in the following listing will put the numbers 1, 2, 3 and 4 in the space of 4 bytes, taking 32 bits, while code in the next four rows will put each number in a different memory cell, occupying 32 bytes:

.data
.byte    1, 2, 3, 4
.byte    1
.byte    2
.byte    3
.byte    4

In the following table, the memory is represented using byte-sized cells and each row is 64 bits wide. The address on the left side of each row of the table refers to the right-most memory cell, that has the lowest address of the eight cells in each line.

0	4	3	2	1
8	0	0	0	1
16	0	0	0	2
24	0	0	0	3
32	0	0	0	4

There are some special directives that need to be discussed: .space, .ascii and .asciiz.

The .space directive is used to leave some free space in memory. It accepts as a parameter an integer, that indicates the number of bytes that must be left empty. It is handy when you must save some space in memory for the results of your computations.

The .ascii directive accepts strings containing any of the ASCII characters, and some special C-like escaping sequences, that are described in the following table, and puts those strings in memory.

Escaping sequence

Meaning

ASCII code

\0

Null byte

0

\t

Horizontal tabulation

9

\n

Newline character

10

\”

Literal quote character

34

\

Literal backslash character

92

Escaping sequence	Meaning	ASCII code
\0	Null byte	0
\t	Horizontal tabulation	9
\n	Newline character	10
\”	Literal quote character	34
\	Literal backslash character	92

The .asciiz directive behaves exactly like the .ascii command, with the difference that it automatically ends the string with a null byte.

The .code section¶

The code section contains commands that specify how the memory must be filled when the program will start. The general form of a .code command is:

[label:] instruction [param1 [, param2 [, param3]]]

The code section can be specified with the .text alias.

The number and the type of parameters depends on the instruction itself.

Instructions can take three types of parameters:

Registers a register parameter is indicated by an uppercase or lowercase “r”, or a $, followed by the number of the register (between 0 and 31), as in “r4”, “R4” or “$4”;
Immediate values an immediate value can be a number or a label; the number can be specified in base 10, base 16 or base 2: base 10 numbers are simply inserted by writing the number, base 16 numbers are inserted by putting before the number the prefix “0x”, and base 2 numbers are inserted by putting before the number the prefix “0b”. Immediate values can be preceded by the # character.
Address an address is composed by an immediate value followed by a register name enclosed in brackets. The value of the register will be used as base, the value of the immediate will be the offset.

The size of immediate values is limited by the number of bits that are available in the bit encoding of the instruction.

When 16-bit immediates can be used, for example in ALU I-Type instructions, it’s also possible to use as an immediate value a memory label. The assembler will put as immediate value the memory address the label points to.

Label offset arithmetic¶

In the address form of a parameter (e.g. the offset of a load or store instruction) the immediate value can be a simple expression built from memory labels and numeric literals combined with the + and - operators. A leading + or - is also accepted and whitespace is allowed between the operands and the operators. Each operand is either a numeric literal (in base 10, 16 or 2) or a memory label defined in the .data section; labels are replaced by their address before the expression is evaluated.

For example, given the data definitions data1: .word 42 and data2: .word 43, the following forms are all accepted:

lw r1, data1+8(r0)       ; load from data1 offset by 8 bytes
lw r1, data1-8(r0)       ; load from data1 offset by -8 bytes
lw r1, 0+data1(r0)       ; equivalent to data1(r0)
lw r1, data2-data1(r0)   ; difference between two label addresses
lw r1, data1-8+16(r0)    ; chains of + and - are supported

Expressions with an empty operand (for example data1+ or data1++0) are rejected as malformed, and an unknown label anywhere in the expression produces the usual “label not found” error.

You can use standard MIPS assembly aliases to address the first 32 registers, appending the alias to one of the standard register prefixes like “r”, “$” and “R”. See the next table.

Register

Alias

0

zero

1

at

2

v0

3

v1

4

a0

5

a1

6

a2

7

a3

8

t0

9

t1

10

t2

11

t3

12

t4

13

t5

14

t6

15

t7

16

s0

17

s1

18

s2

19

s3

20

s4

21

s5

22

s6

23

s7

24

t8

25

t9

26

k0

27

k1

28

gp

29

sp

30

fp

31

ra

Register	Alias
0	zero
1	at
2	v0
3	v1
4	a0
5	a1
6	a2
7	a3
8	t0
9	t1
10	t2
11	t3
12	t4
13	t5
14	t6
15	t7
16	s0
17	s1
18	s2
19	s3
20	s4
21	s5
22	s6
23	s7
24	t8
25	t9
26	k0
27	k1
28	gp
29	sp
30	fp
31	ra

The #include command¶

Source files can contain the #include filename command, which has the effect of putting in place of the command row the content of the file filename. It is useful if you want to include external routines, and it comes with a loop-detection algorithm that will warn you if you try to do something like “#include A.s” in file B.s and “#include B.s” in file A.s.