nstar
nstar copied to clipboard
Enlarge the instruction set
Currently, there are too few instructions included in N*.
Here's an exhaustive list:
jmpunconditionnally jumps to the given label;callis a restricted form ofjmpwhich accounts for a "return address"retjumps to the address in the continuation spacesldloads a value from the stack into a registersststores the value inside a register into the stack at the given indexsallocallocates some space on top of the stack (similarly toallocain C, though the size is restricted by the type instead of being a literal integer)sfreeremoves the top-most stack cellsrefallows fetching a reference (pointer) to a given stack cellldloads a value from a memory address into a registerstwrites a value inside a register at a memory addressmvmoves a register or immediate value into a registernopis useless and does absolutely nothing more than take space in the resulting object
However, there are quite a lot more useful instructions to have in order to make N⋆ usable as both a programming assembly language and a compiler backend for Zilch. Here are some examples:
- arithmetic operations such as
add,sub,mul, etc. with floating point counterparts (addf,subf,mulf, etc. where thefsuffix stands for “floating”) - conditionnal control flow operations using
jz,jnz,je,jne,jl,jle,jg,jge, etc. which act as simple jumps (e.g.jz %r0, then, elsejumps to thethenlabel if the content of%r0is0, else jumps to theelselabel) - bitwise operators such as
and,or,xor,not, etc. Boolean logic operators may also be of great use as specific instructions, in order not to fall into traps such asnot 5 != 0(we could havenotb 5 = 0, orandb 1 2 = 1, where every non-zero value is considered the truth value)
Of course this list is absolutely not exhaustive (although it should be sufficient in the beginning). Many more instructions could be added (even for specific things such as SIMD), as long as they can be typed properly inside N⋆. All the above instructions will need to have types, but this can be done while adding them to the language/compiler.
Roadmap
- [x] logical instructions:
- [x]
and: bitwise logical AND - [x]
or: bitwise logical OR - [x]
xor: bitwise logical XOR - [x]
not: bitwise logical NOT - [x]
shiftl: bitwise logical LEFT SHIFT - [x]
shiftr: bitwise logical RIGHT SHIFT
- [x]
- [x] Conditional moves:
- [x]
cmvz: conditionally move if zero - [x]
cmvnz: conditionally move if not zero - [x]
cmvl: conditionally move if less than - [x]
cmvle: conditionally move if less than or equal - [x]
cmvg: conditionally move if greater than - [x]
cmvge: conditionally move if greater than or equal - [x]
cmve: conditionally move if equal - [x]
cmvne: conditionally move if not equal
- [x]
- [x] Conditional jumps:
- [x]
cjz: jump if zero - [x]
cjnz: jump if not zero - [x]
cjl: jump if less than - [x]
cjle: jump if less than or equal - [x]
cjg: jump if greater than - [x]
cjge: jump if greater than or equal - [x]
cje: jump if equal - [x]
cjne: jump if not equal
- [x]
- [ ] Integer arithmetic instructions:
- [x]
add: add two integers - [x]
sub: subtract one integer from another - [x]
mul: multiply two integers together - [ ]
div: divide one integer by another
- [x]
Note on boolean instructions: there are no direct equivalent in modern machine code (x86 does not have such concept of "boolean").
notb %r0 will therefore be compiled to code similar to (on x86)
CMP $0, %rax # Compare %r0 with 0, set ZF=1 if equal
SETE %al # set %r0[56:8] to ZF
MOVZX %al, %rax # Copy %ro[56:8] to %r0, filling %ro[0:56] with zeros
The last two operations could be replaced with
CMOVZ $1, %rax # set %r0 to 1 if ZF=1
CMOVNZ $0, %rax # set %r0 to 0 if ZF=0
although performance could be different (because CMOVcc needs to look at the flags every time, whereas only SETE looks at them above).
As there is no notion of "compare flag (CF)" and "zero flag (ZF), the cmp operation cannot exist in N⋆. This explains the need for boolean instructions.
On ARM, we may have these opcodes
cmp r0, #0
ite eq /* if-then-else */
moveq r0, #1 /* r0 = 1 iff r0 = 0 */
movne r0, #0 /* r0 = 0 iff r0 = 1 */
uxtb r0 /* 0-extend %r0 */
On ARM64 (AArch64) we have
cmp w0, #0
cset w0, eq
and w0, w0, K /* where K depends on the size of w0, as long as it is 1-filled (e.g. 0xFFFF) */
The idea is that the and is used to 0-fill the register.
On MIPS, we may have
sltu $0, $0, 1
andi $0, $0, K # same as above for ARM64
Conditional moves are also a very good part of the instruction set. They avoid most of branches.
See the example above for notb, which uses conditional moves. If they were not available, one would have to define notb as the following:
cmp %rax, $0
jz set1
set0:
mv $0, %rax
j end_not
set1:
mv $1, %rax
end_not:
# ...
This is very heavy and makes quite a lot of jumps… And it's also not quite readable compared to the earlier version.
In N* though, this control flow is a lot heavier:
# ...
jz %r0, set1, set0
set0: ...
mv %r0, 1;
jmp end_not<...>
set1: ...
mv %r0, 0;
jmp end_not<...>
end_not: ...
# ...
If we take a look at the omitted type, one will see that the types for both set0 and set1 are exactly the same (they require the continuation in the same space, the same registers, and will return the same registers.
end_not on the other end is the same type as for set0 with the added constraint that it must take %r0: uN (which we may not propagate in both set0 and set1).
So there is quite a lot of redundancy for that few instructions, which conditional moves actually solve (by integrating them within the type inference rules).
As we have every kind of branching instructions (see above), we will also need all the conditional moves for the same “branchings”.