nstar Enlarge the instruction set

Currently, there are too few instructions included in N*.

Here's an exhaustive list:

jmp unconditionnally jumps to the given label;
call is a restricted form of jmp which accounts for a "return address"
ret jumps to the address in the continuation space
sld loads a value from the stack into a register
sst stores the value inside a register into the stack at the given index
salloc allocates some space on top of the stack (similarly to alloca in C, though the size is restricted by the type instead of being a literal integer)
sfree removes the top-most stack cell
sref allows fetching a reference (pointer) to a given stack cell
ld loads a value from a memory address into a register
st writes a value inside a register at a memory address
mv moves a register or immediate value into a register
nop is useless and does absolutely nothing more than take space in the resulting object

However, there are quite a lot more useful instructions to have in order to make N⋆ usable as both a programming assembly language and a compiler backend for Zilch. Here are some examples:

arithmetic operations such as add, sub, mul, etc. with floating point counterparts (addf, subf, mulf, etc. where the f suffix stands for “floating”)
conditionnal control flow operations using jz, jnz, je, jne, jl, jle, jg, jge, etc. which act as simple jumps (e.g. jz %r0, then, else jumps to the then label if the content of %r0 is 0, else jumps to the else label)
bitwise operators such as and, or, xor, not, etc. Boolean logic operators may also be of great use as specific instructions, in order not to fall into traps such as not 5 != 0 (we could have notb 5 = 0, or andb 1 2 = 1, where every non-zero value is considered the truth value)

Of course this list is absolutely not exhaustive (although it should be sufficient in the beginning). Many more instructions could be added (even for specific things such as SIMD), as long as they can be typed properly inside N⋆. All the above instructions will need to have types, but this can be done while adding them to the language/compiler.

Roadmap

[x] logical instructions:
- [x] and: bitwise logical AND
- [x] or: bitwise logical OR
- [x] xor: bitwise logical XOR
- [x] not: bitwise logical NOT
- [x] shiftl: bitwise logical LEFT SHIFT
- [x] shiftr: bitwise logical RIGHT SHIFT
[x] Conditional moves:
- [x] cmvz: conditionally move if zero
- [x] cmvnz: conditionally move if not zero
- [x] cmvl: conditionally move if less than
- [x] cmvle: conditionally move if less than or equal
- [x] cmvg: conditionally move if greater than
- [x] cmvge: conditionally move if greater than or equal
- [x] cmve: conditionally move if equal
- [x] cmvne: conditionally move if not equal
[x] Conditional jumps:
- [x] cjz: jump if zero
- [x] cjnz: jump if not zero
- [x] cjl: jump if less than
- [x] cjle: jump if less than or equal
- [x] cjg: jump if greater than
- [x] cjge: jump if greater than or equal
- [x] cje: jump if equal
- [x] cjne: jump if not equal
[ ] Integer arithmetic instructions:
- [x] add: add two integers
- [x] sub: subtract one integer from another
- [x] mul: multiply two integers together
- [ ] div: divide one integer by another

Jun 17 '22 08:06 Mesabloo

Note on boolean instructions: there are no direct equivalent in modern machine code (x86 does not have such concept of "boolean"). notb %r0 will therefore be compiled to code similar to (on x86)

CMP $0, %rax     # Compare %r0 with 0, set ZF=1 if equal
SETE %al         # set %r0[56:8] to ZF
MOVZX %al, %rax  # Copy %ro[56:8] to %r0, filling %ro[0:56] with zeros

The last two operations could be replaced with

CMOVZ $1, %rax  # set %r0 to 1 if ZF=1
CMOVNZ $0, %rax  # set %r0 to 0 if ZF=0

although performance could be different (because CMOVcc needs to look at the flags every time, whereas only SETE looks at them above).

As there is no notion of "compare flag (CF)" and "zero flag (ZF), the cmp operation cannot exist in N⋆. This explains the need for boolean instructions.

On ARM, we may have these opcodes

cmp r0, #0
ite eq        /* if-then-else  */
moveq r0, #1  /* r0 = 1 iff r0 = 0 */
movne r0, #0  /* r0 = 0 iff r0 = 1 */
uxtb r0       /* 0-extend %r0 */

On ARM64 (AArch64) we have

cmp w0, #0
cset w0, eq
and w0, w0, K  /* where K depends on the size of w0, as long as it is 1-filled (e.g. 0xFFFF) */

The idea is that the and is used to 0-fill the register.

On MIPS, we may have

sltu $0, $0, 1
andi $0, $0, K  # same as above for ARM64

Jun 17 '22 09:06 Mesabloo

Conditional moves are also a very good part of the instruction set. They avoid most of branches. See the example above for notb, which uses conditional moves. If they were not available, one would have to define notb as the following:

  cmp %rax, $0
  jz set1
set0:
  mv $0, %rax
  j end_not
set1:
  mv $1, %rax
end_not:
  # ...

This is very heavy and makes quite a lot of jumps… And it's also not quite readable compared to the earlier version.

In N* though, this control flow is a lot heavier:

  # ...
  jz %r0, set1, set0

set0: ...
  mv %r0, 1;
  jmp end_not<...>
set1: ...
  mv %r0, 0;
  jmp end_not<...>

end_not: ...
  # ...

If we take a look at the omitted type, one will see that the types for both set0 and set1 are exactly the same (they require the continuation in the same space, the same registers, and will return the same registers. end_not on the other end is the same type as for set0 with the added constraint that it must take %r0: uN (which we may not propagate in both set0 and set1). So there is quite a lot of redundancy for that few instructions, which conditional moves actually solve (by integrating them within the type inference rules).

As we have every kind of branching instructions (see above), we will also need all the conditional moves for the same “branchings”.

Sep 22 '22 06:09 Mesabloo

nstar nstar copied to clipboard

Enlarge the instruction set

Roadmap

nstar
nstar copied to clipboard