Luke Gorrie
Luke Gorrie
I have extracted one hot trace for hopefully apples-to-apples comparison between GC32 and GC64 mode here: https://gist.github.com/lukego/c6a7990290890c9a9f8e1b8c86cf905e. This trace accounts for ~20% total CPU time for the application.
How to test the hypothesis that the issue is register pressure? The simplest experiment I have come up with so far is to make LuaJIT GC32 mode also reserve `RID_DISPATCH`...
> references to the global_State/dispatch table are no longer encodable as immediates Interesting observation. In that case adding register pressure to GC32 mode might not cause the same problem. But...
@wingo I'm trying to understand why we have those LEAs. It looks like each one is only used once. (I'm guessing they are `EQ` guards checking the identify of objects.)...
(Oh, sorry, stupid question, only one is dereferencing...)
News: The patch above (reserve RID_DISPATCH in GC32 mode) did not impact performance and that is evidence against the root problem being not having r14 available for mcode.
@wingo Some of these offsets are quite large e.g. 0x18d5c90 (~26M.) So I don't think these are actually globals but just objects whose addresses happen to be encodable as a...
@wingo The reason we see so many loads of constants clustered at the start of the machine code seems to be a peculiarity of the assembler: 1. Code is assembled...
Loading Lua values from memory, checking their types, masking their values also uses different instructions in GC64 compared with GC32. This might naturally change (increase) the number of machine code...
This is a hard nut to crack. I'm not immediately sure how to account for the performance difference given the data we have now. I'm also not sure what experiment...