spin2cpp Bytecode further work tracking issue

Here's all the stuff that'd be nice to do with the bytecode output. Mostly just so I don't forget when I come back to it.

[x] Add bytecode tests to test suite (or repurpose existing tests) (Eric added exec tests, still no expected output tests)
[ ] update documentation with relevant info
[x] CASE/LOOK* with lookup table (LOOK* still isn't, but whatever)
- Needs way to word-align the table
[x] Investigate cycle counts for different constant encodings
[x] Use logic AND instead of two jumps when compiling a K_BOOL_AND condition where the right side is trivial (single memop and such)
[x] Introduce -Os to use suboptimal constructs to save bytes (such as encoding negative constants as abs value + a NEG)
[x] Optimize x + (-1) into x - 1
[x] Optimize x == 0 and x <> 0 conditions
[x] Optimize empty repeat while (currently compiles 1 unnecessary jump. Could be a peephole)
[ ] Improve listing output (incorporate source lines)
- Perhaps difficult due to reordering optimization
[x] Support some flexspin-isms (subobject variables, object pointers)
- [ ] Maybe: compiling of runtime-loadable objects
[ ] implement byte/word locals
[x] fix loop-reduce optimization
[ ] Integrate optimized RAM interpreter for P1
[ ] Implement limited inlining (for getter/setter functions that are common in Spin)
[x] Investigate P2/spin2 support
[ ] Add C-specific optimization (calls to global modules)

Jun 03 '21 11:06 Wuerfel21

I've started work on C/BASIC support, or at least support for the runtime libraries. I'm able to compile a very simple BASIC program (LED blinker) now. I think we'll need a helper COG for printf and inline assembly, but that should only be pulled in if actually required.

Jun 09 '21 00:06 totalspectrum

Running inline ASM in another cog seems not very useful - takes a cog and would acess that cog's registers.

I'd just not allow it with the ROM interpreter.

Jun 09 '21 07:06 Wuerfel21

You can also get away without a cog for serial if you can live with low baud rate (19200 seems to work)

Jun 09 '21 07:06 Wuerfel21

Not sure how to handle variadic functions though. Dynamic parameter count is easy, but then all the locals shift around. I think the locals would need to be put before the parameters and the caller needs to make space for them on the stack (presumably by using a REG_MODIFY to increment DCURR?)

Jun 09 '21 07:06 Wuerfel21

Also, how'd we include the interpreter code (when not using the ROM interpreter / on P2)? I guess the same kinda thing that happens with the syscode, though I have a very strong distaste for manual updating of intermediate files, i.e. it not automatically updating the header file from the source file (and IDK what tool is used to generate it to begin with). I think inline ASM can be used to include a file directly, though that'll only work for compilers with GAS-like inline asm. I think as is flexspin only builds on GCC/LLVM, anyways, so that wouldn't be an issue?

Jun 09 '21 16:06 Wuerfel21

The .spin files in the sys/ directory are converted to binary blobs with xxd automatically, and we can do something similar for the interpreter (although it will have to be compiled first, but we can do the compilation with flexspin itself). The .spin.h files are checked in to github to help people bootstrap and to make it easier to build on Windows (not sure if xxd is widely available there, it's a standard Unix tool though).

Jun 09 '21 18:06 totalspectrum

Could you, like, not close the tracking issue please?

Jun 09 '21 18:06 Wuerfel21

Also,

not sure if xxd is widely available there

xxd seems to be included with Git for Windows, so it seems unlikely that it'd be not installed, though it isn't on PATH (Git Bash seems to add it though.

Jun 09 '21 18:06 Wuerfel21

Sorry, clicked on the wrong button

Jun 09 '21 19:06 totalspectrum

For multiple return values, we could perhaps use the DIRB, INB, and OUTB registers, e.g. the first return value goes on the stack as usual, second one goes in DIRB, third one in INB, and fourth one in OUTB. For plain Spin programs this won't be an issue (they don't have multiple returns) but it will allow us to compile Spin2, BASIC, and C.

Jun 10 '21 12:06 totalspectrum

Yeah, that seems like a good workaround. Though for Spin2 they'll still need to be allocated on the stack for indexing and such. The same add-to-DCURR thing that'll need to happen for varargs, I guess.

Jun 10 '21 13:06 Wuerfel21

the same add-to-DCURR thing that'll need to happen for varargs, I guess.

Actually, results are zero-initialized in Spin2, too, right? so it'd have to be bunch of zero constants before the parameter expressions. That's smaller than adding to DCURR for small counts, anyways.

Jun 10 '21 20:06 Wuerfel21

There's a new test script, Test/runtests_bc.sh, for testing byte code. I've disabled the C and BASIC tests (we know they won't work). Quite a few of the Spin tests do work, and one of them showed a bug that I've fixed (converting division into right shift fails for negative numbers that are not powers of 2).

Of the remaining failing tests, I think the big missing features are multiple assignment (exec03) and method/object pointers (exec04). exec11 is a timing test so its failure is entirely expected (and I should add an #ifdef for bytecode output so we can ignore it).

Jun 11 '21 11:06 totalspectrum

multi-assign between variables should be easy, I think? Unless there's a silly Spin2 quirk that makes it stupid, that is. Just eval all the expressions, then the stores in reverse order. Not sure if that's the right eval order for the right hand though.

Calling through pointers is a problem with the P1 ROM interpreter. The obvious idea is to create a dummy header entry and rewrite it's offset before the call, but

not thread/multicore safe
could be made multicore safe by using 8 dummies and indexing with cogid, but that's lame.
function header entries need to know the local size

For object pointers, I think just writing them to VBASE between the call parameters and the call opcode would work? That'd be thread-safe and probably less opcodes. Would still need a dummy header entry with zero offset.

C-style Function pointers, uhhhh, idk. Again, need to know the local size to call a function. I guess make it like a Spin2 method pointer, but with PBASE instead of VBASE

Spin2-style method pointers would need quite a bit of helper code to work on P1 and will require implementing Spin2-style RTTI (wherein the first VAR long of an object gets PBASE written to it when taking a method pointer). There'll need to be a special helper function to handle calls to a method pointer that

sets VBASE from the method pointer
sets PBASE from the RTTI pointer (now at VBASE+0)
indexes into PBASE with the ID part of the method pointer to get the method's header entry
allocates the locals
jumps into the method

I think the pointer will need to be passed in a register, because it simplifies the helper function a lot.

I think this'd work:

PUB __methodptr_helper
' doesn't declare parameters, but will be called with whatever the target function expects 
  __interp_vbase := inb[0..15]
  __interp_pbase := long[__interp_vbase][0] ' There is no syntax for reading the variable at a given offset. I guess we could just optimize this workaround.
  result := long[__interp_pbase][inb[20..31]] ' This is where Spin2 keeps the method ID, but for P1 it'd be more sensible to be at 24..31
  __interp_dcurr += result.word[1]~ ' allocate locals
  inb := 0
  __interp_pcurr := result.word[0]~ + __interp_pbase ' jump into function

' Same thing but for function pointer
PUB __funcptr_helper
  __interp_pbase := inb[0..15]
  result := long[__interp_pbase][inb[20..31]]
  __interp_dcurr += result.word[1]~ ' allocate locals
  inb := 0
  __interp_pcurr := result.word[0]~ + __interp_pbase ' jump into function

Jun 11 '21 12:06 Wuerfel21

Also, I think:tm: the proper fix for the division to right-shift would be change the else if (shiftOptOp && isPowerOf2(rightVal)) further down into else if (shiftOptOp && rightVal > 0 && isPowerOf2(rightVal))

Jun 11 '21 12:06 Wuerfel21

Dave Hein created a method pointer class in Spin1 that used 4 words to store pcurr, vbase, stack variable offset, and pbase. This aligns pretty well with how the assembly code handles it (it uses 2 longs instead of 4 words, but same amount of space).

Jun 11 '21 13:06 totalspectrum

The division right shift issue is a problem if the left hand side is negative. That is, -1 SHR 1 is still -1, whereas -1 / 2 should give 0. In the assembly for x / 2^N we generate something like (x < 0) ? -(ABS(x)>>N) : (x>>N), but I'm not sure if it's worth doing that for bytecode? I guess it's quite a bit faster than division, but it'll consume a lot more space, too.

Jun 11 '21 13:06 totalspectrum

There's problems with that approach though

Spin2 interpeter supports method pointers natively, so using similar on P1 seems appropriate
Doesn't fit in a single long
Need to know the local size of the called function (which would prevent the aforementioned idea of loading objects at runtime)
More code to init each pointer

Jun 11 '21 13:06 Wuerfel21

The division right shift issue is a problem if the left hand side is negative. That is, -1 SHR 1 is still -1, whereas -1 / 2 should give 0. In the assembly for x / 2^N we generate something like (x < 0) ? -(ABS(x)>>N) : (x>>N), but I'm not sure if it's worth doing that for bytecode? I guess it's quite a bit faster than division, but it'll consume a lot more space, too.

Ahh, of course.

Doing that workaround is likely not (much) faster than an actual divide and doesn't work as an assignment, so yeah, probably just remove it. Keep the no-op optimization though.

Jun 11 '21 14:06 Wuerfel21

Here's another idea for function/method pointers: they could always point at the descriptor for the function inside the object (the one with the dcurr and pcurr offsets). Immediately after any function whose address was taken we would insert a dummy function descriptor which points back at pbase. The function pointer itself would then fit in 16 bits, and for method pointers we'd need an additional 16 bits for vbase.

Jun 12 '21 01:06 totalspectrum

That could work well, yeah. Any major advantage over the Spin2-like approach though?

Jun 12 '21 09:06 Wuerfel21

Unrelatedly, I think I'm going to look into jump table CASE now. Saying that so we don't step on our feet.

Jun 12 '21 13:06 Wuerfel21

Sounds good, thanks. I'm going to take a stab at function/method pointers, so as to get C/BASIC printing to work (the file handles use a table of pointers).

Jun 12 '21 13:06 totalspectrum

Regarding the zero-extend operator, it'd be useful to optimize constant zero-extend into an AND. Can you add that to OptimizeOperator? Also, setting unary to have it not compile the expression again seems really jank, I think it'd be better to just push the SHR/SAR op in the case and return, like BOOL_AND does.

Jun 12 '21 13:06 Wuerfel21

Also, I think it's just implemented wrong to begin with, shouldn't the shift amount be 31-x or something?

Jun 12 '21 13:06 Wuerfel21

Also, the current implementation for jump table generation relies on arbitrary GOTO and there being a variable holding the expression result. The former needs to be implemented for C/BASIC, anyways, the latter I'll see if I can work around.

Jun 12 '21 14:06 Wuerfel21

RE: the zeroext thing

As I said, the zeroext to AND transform should really be in OptimizeOperator, since that way it can apply to assignments. Also, if you want a constant, you can just use BCCompileInteger instead of allocating an integer and running it through BCCompileExpression

Jun 12 '21 16:06 Wuerfel21

I've got the jump tables implemented, will finish up tomorrow. Is in my BC-casefast branch. Main problem RN is that the jump table index needs unsigned min/max to be computed. I've built an optimization that converts unsigned ops to signed ones where it makes sense, but right now that only works if the first case is zero (because otherwise a subtraction is inserted) and the expression is explicitly masked (#157 blocks it from detecting small unsigned types).

But I think the actual solution is to insert an additional OTHER case at the beginning of a table, such that an expression like ((x-offset)<#max)>#0 can be used.

The unsigned->signed optimization will be neat for C though, I guess.

Jun 12 '21 23:06 Wuerfel21

Yes, C is going to need a bunch of unsigned operators that Spin1 doesn't. They'll come in handy for Spin2 as well.

Jun 12 '21 23:06 totalspectrum

I've implemented some simplified I/O and enough unsigned operators to get simple BASIC and C "hello world" style programs working. The unsigned operators are converted to function calls for now; we may be able to figure out better ways to implement them in the future.

Jun 13 '21 17:06 totalspectrum

spin2cpp spin2cpp copied to clipboard

Bytecode further work tracking issue

spin2cpp
spin2cpp copied to clipboard