spin2cpp
spin2cpp copied to clipboard
Bytecode further work tracking issue
Here's all the stuff that'd be nice to do with the bytecode output. Mostly just so I don't forget when I come back to it.
- [x] Add bytecode tests to test suite (or repurpose existing tests) (Eric added exec tests, still no expected output tests)
- [ ] update documentation with relevant info
- [x] CASE/LOOK* with lookup table (LOOK* still isn't, but whatever)
- Needs way to word-align the table
- [x] Investigate cycle counts for different constant encodings
- [x] Use logic AND instead of two jumps when compiling a K_BOOL_AND condition where the right side is trivial (single memop and such)
- [x] Introduce -Os to use suboptimal constructs to save bytes (such as encoding negative constants as abs value + a NEG)
- [x] Optimize
x + (-1)
intox - 1
- [x] Optimize
x == 0
andx <> 0
conditions - [x] Optimize empty
repeat while
(currently compiles 1 unnecessary jump. Could be a peephole) - [ ] Improve listing output (incorporate source lines)
- Perhaps difficult due to reordering optimization
- [x] Support some flexspin-isms (subobject variables, object pointers)
- [ ] Maybe: compiling of runtime-loadable objects
- [ ] implement byte/word locals
- [x] fix loop-reduce optimization
- [ ] Integrate optimized RAM interpreter for P1
- [ ] Implement limited inlining (for getter/setter functions that are common in Spin)
- [x] Investigate P2/spin2 support
- [ ] Add C-specific optimization (calls to global modules)
I've started work on C/BASIC support, or at least support for the runtime libraries. I'm able to compile a very simple BASIC program (LED blinker) now. I think we'll need a helper COG for printf and inline assembly, but that should only be pulled in if actually required.
Running inline ASM in another cog seems not very useful - takes a cog and would acess that cog's registers.
I'd just not allow it with the ROM interpreter.
You can also get away without a cog for serial if you can live with low baud rate (19200 seems to work)
Not sure how to handle variadic functions though. Dynamic parameter count is easy, but then all the locals shift around. I think the locals would need to be put before the parameters and the caller needs to make space for them on the stack (presumably by using a REG_MODIFY to increment DCURR?)
Also, how'd we include the interpreter code (when not using the ROM interpreter / on P2)? I guess the same kinda thing that happens with the syscode, though I have a very strong distaste for manual updating of intermediate files, i.e. it not automatically updating the header file from the source file (and IDK what tool is used to generate it to begin with). I think inline ASM can be used to include a file directly, though that'll only work for compilers with GAS-like inline asm. I think as is flexspin only builds on GCC/LLVM, anyways, so that wouldn't be an issue?
The .spin files in the sys/ directory are converted to binary blobs with xxd automatically, and we can do something similar for the interpreter (although it will have to be compiled first, but we can do the compilation with flexspin itself). The .spin.h files are checked in to github to help people bootstrap and to make it easier to build on Windows (not sure if xxd is widely available there, it's a standard Unix tool though).
Could you, like, not close the tracking issue please?
Also,
not sure if xxd is widely available there
xxd seems to be included with Git for Windows, so it seems unlikely that it'd be not installed, though it isn't on PATH (Git Bash seems to add it though.
Sorry, clicked on the wrong button
For multiple return values, we could perhaps use the DIRB, INB, and OUTB registers, e.g. the first return value goes on the stack as usual, second one goes in DIRB, third one in INB, and fourth one in OUTB. For plain Spin programs this won't be an issue (they don't have multiple returns) but it will allow us to compile Spin2, BASIC, and C.
Yeah, that seems like a good workaround. Though for Spin2 they'll still need to be allocated on the stack for indexing and such. The same add-to-DCURR thing that'll need to happen for varargs, I guess.
the same add-to-DCURR thing that'll need to happen for varargs, I guess.
Actually, results are zero-initialized in Spin2, too, right? so it'd have to be bunch of zero constants before the parameter expressions. That's smaller than adding to DCURR for small counts, anyways.
There's a new test script, Test/runtests_bc.sh, for testing byte code. I've disabled the C and BASIC tests (we know they won't work). Quite a few of the Spin tests do work, and one of them showed a bug that I've fixed (converting division into right shift fails for negative numbers that are not powers of 2).
Of the remaining failing tests, I think the big missing features are multiple assignment (exec03) and method/object pointers (exec04). exec11 is a timing test so its failure is entirely expected (and I should add an #ifdef for bytecode output so we can ignore it).
multi-assign between variables should be easy, I think? Unless there's a silly Spin2 quirk that makes it stupid, that is. Just eval all the expressions, then the stores in reverse order. Not sure if that's the right eval order for the right hand though.
Calling through pointers is a problem with the P1 ROM interpreter. The obvious idea is to create a dummy header entry and rewrite it's offset before the call, but
- not thread/multicore safe
- could be made multicore safe by using 8 dummies and indexing with cogid, but that's lame.
- function header entries need to know the local size
For object pointers, I think just writing them to VBASE between the call parameters and the call opcode would work? That'd be thread-safe and probably less opcodes. Would still need a dummy header entry with zero offset.
C-style Function pointers, uhhhh, idk. Again, need to know the local size to call a function. I guess make it like a Spin2 method pointer, but with PBASE instead of VBASE
Spin2-style method pointers would need quite a bit of helper code to work on P1 and will require implementing Spin2-style RTTI (wherein the first VAR long of an object gets PBASE written to it when taking a method pointer). There'll need to be a special helper function to handle calls to a method pointer that
- sets VBASE from the method pointer
- sets PBASE from the RTTI pointer (now at VBASE+0)
- indexes into PBASE with the ID part of the method pointer to get the method's header entry
- allocates the locals
- jumps into the method
I think the pointer will need to be passed in a register, because it simplifies the helper function a lot.
I think this'd work:
PUB __methodptr_helper
' doesn't declare parameters, but will be called with whatever the target function expects
__interp_vbase := inb[0..15]
__interp_pbase := long[__interp_vbase][0] ' There is no syntax for reading the variable at a given offset. I guess we could just optimize this workaround.
result := long[__interp_pbase][inb[20..31]] ' This is where Spin2 keeps the method ID, but for P1 it'd be more sensible to be at 24..31
__interp_dcurr += result.word[1]~ ' allocate locals
inb := 0
__interp_pcurr := result.word[0]~ + __interp_pbase ' jump into function
' Same thing but for function pointer
PUB __funcptr_helper
__interp_pbase := inb[0..15]
result := long[__interp_pbase][inb[20..31]]
__interp_dcurr += result.word[1]~ ' allocate locals
inb := 0
__interp_pcurr := result.word[0]~ + __interp_pbase ' jump into function
Also, I think:tm: the proper fix for the division to right-shift would be change the else if (shiftOptOp && isPowerOf2(rightVal))
further down into else if (shiftOptOp && rightVal > 0 && isPowerOf2(rightVal))
Dave Hein created a method pointer class in Spin1 that used 4 words to store pcurr, vbase, stack variable offset, and pbase. This aligns pretty well with how the assembly code handles it (it uses 2 longs instead of 4 words, but same amount of space).
The division right shift issue is a problem if the left hand side is negative. That is, -1 SHR 1 is still -1, whereas -1 / 2 should give 0. In the assembly for x / 2^N we generate something like (x < 0) ? -(ABS(x)>>N) : (x>>N), but I'm not sure if it's worth doing that for bytecode? I guess it's quite a bit faster than division, but it'll consume a lot more space, too.
There's problems with that approach though
- Spin2 interpeter supports method pointers natively, so using similar on P1 seems appropriate
- Doesn't fit in a single long
- Need to know the local size of the called function (which would prevent the aforementioned idea of loading objects at runtime)
- More code to init each pointer
The division right shift issue is a problem if the left hand side is negative. That is, -1 SHR 1 is still -1, whereas -1 / 2 should give 0. In the assembly for x / 2^N we generate something like (x < 0) ? -(ABS(x)>>N) : (x>>N), but I'm not sure if it's worth doing that for bytecode? I guess it's quite a bit faster than division, but it'll consume a lot more space, too.
Ahh, of course.
Doing that workaround is likely not (much) faster than an actual divide and doesn't work as an assignment, so yeah, probably just remove it. Keep the no-op optimization though.
Here's another idea for function/method pointers: they could always point at the descriptor for the function inside the object (the one with the dcurr and pcurr offsets). Immediately after any function whose address was taken we would insert a dummy function descriptor which points back at pbase. The function pointer itself would then fit in 16 bits, and for method pointers we'd need an additional 16 bits for vbase.
That could work well, yeah. Any major advantage over the Spin2-like approach though?
Unrelatedly, I think I'm going to look into jump table CASE now. Saying that so we don't step on our feet.
Sounds good, thanks. I'm going to take a stab at function/method pointers, so as to get C/BASIC printing to work (the file handles use a table of pointers).
Regarding the zero-extend operator, it'd be useful to optimize constant zero-extend into an AND. Can you add that to OptimizeOperator? Also, setting unary
to have it not compile the expression again seems really jank, I think it'd be better to just push the SHR/SAR op in the case and return, like BOOL_AND does.
Also, I think it's just implemented wrong to begin with, shouldn't the shift amount be 31-x
or something?
Also, the current implementation for jump table generation relies on arbitrary GOTO and there being a variable holding the expression result. The former needs to be implemented for C/BASIC, anyways, the latter I'll see if I can work around.
RE: the zeroext thing
As I said, the zeroext to AND transform should really be in OptimizeOperator
, since that way it can apply to assignments. Also, if you want a constant, you can just use BCCompileInteger
instead of allocating an integer and running it through BCCompileExpression
I've got the jump tables implemented, will finish up tomorrow. Is in my BC-casefast branch. Main problem RN is that the jump table index needs unsigned min/max to be computed. I've built an optimization that converts unsigned ops to signed ones where it makes sense, but right now that only works if the first case is zero (because otherwise a subtraction is inserted) and the expression is explicitly masked (#157 blocks it from detecting small unsigned types).
But I think the actual solution is to insert an additional OTHER case at the beginning of a table, such that an expression like ((x-offset)<#max)>#0
can be used.
The unsigned->signed optimization will be neat for C though, I guess.
Yes, C is going to need a bunch of unsigned operators that Spin1 doesn't. They'll come in handy for Spin2 as well.
I've implemented some simplified I/O and enough unsigned operators to get simple BASIC and C "hello world" style programs working. The unsigned operators are converted to function calls for now; we may be able to figure out better ways to implement them in the future.