TIC-80
                                
                                 TIC-80 copied to clipboard
                                
                                    TIC-80 copied to clipboard
                            
                            
                            
                        LUA sethook slows things down
The LUA debug hook mechanism is used to prevent infinite loops / allowing exiting LUA script. I'm talking about the hook here:
https://github.com/nesbox/TIC-80/blob/60e63de918ae03bc9223097cc1369459d94cbb99/src/api/lua.c#L1382
The count is set pretty high, so that's fine. Unfortunately, if the count hook is enabled, LUA still calls luaG_traceexec every instruction. luaG_traceexec just decreases the current count and returns if it has not reached 0 yet. Even though this is relatively cheap, it is still very much in the hottest path of the VM, so the luaG_traceexec calls actually costs some 10-20% of the CPU time (!) consumed TIC-80 when doing heavy stuff in LUA, according to profiling.
I totally understand why the sethook is needed (to prevent accidental data loss), but slowing down TIC-80 so much just for this seems a bit excessive.
There's a few options:
- Keep it as it is. Users can prevent the hook by invoking debug.sethook() themselves, to gain the extra speed. The problem with this approach is that how many will learn / know this arcane spell?
- Add the hook only when the cart is run from the studio. Presumably, that's when there's development going on; if someone is playing a game, a crash is not any worse than exiting to console as they lose their game progress both ways.
- Add the hook only when there are unsaved changes to the cart.
- Find another mechanism to prevent infinite loops e.g. another thread that can kill the LUA if it takes too long.
Is there a reason this is so expensive? Can it be inlined? Reducing a counter each tick should not be an expensive operation.
I can try to profile more and/or try compile options, but bear in mind that most VM instructions ain't doing too much beyond adding, subtracting etc. a few numbers. So, the small counter decrement and possible branch might still add up.
That's why I asked if it's inlined. There should be no need for any branching or function calls. It should - you would think - just increment inside the main execution loop.
I cannot get Microsoft compiler to inline it; presumably because luaG_traceexec is too long overall:
https://github.com/lua/lua/blob/75ea9ccbea7c4886f30da147fb67b693b2624c26/ldebug.c#L667
The lines that get run every VM instruction are:
void luaG_traceexec (lua_State *L) {
  CallInfo *ci = L->ci;
  lu_byte mask = L->hookmask;
  int counthook = (--L->hookcount == 0 && (mask & LUA_MASKCOUNT));
  if (counthook)
    resethookcount(L);  /* reset count */
  else if (!(mask & LUA_MASKLINE))
    return;  /* no line hook and count != 0; nothing to be done */
...
Those become ~ 20 assembly instructions (movs, push, sub, and, xor etc.). After that, there's quite a bit of code that run the hook if needed, which is why I don't think the compiler likes to inline it, even when compiling with "inline anything suitable".
LUA can be forked and luaG_traceexec broken into two parts, to try to get compiler inline the top part. That's going to save some of the instructions (call / ret, some shadow space setup on windows), but it's a lot of work for something that still probably leaves some overhead.
I removed the infinite loop checking here 1a4e61a and it works significantly faster now, but we have to implement a new infinite loop checking mechanism...
Perhaps we should have just placed it all behind the CHECK_FORCE_EXIT compile flag, as was already done in a few places?  Did we confirm the same measurable slowdown with the other engines that we're seeing with Lua?
Or we can add a special dev mode where we will check the infinite loops, it could be a button in the Code Editor.
By default, all carts run without infinite loops checking.
I see it like this.
Or we can add a special dev mode where we will check the infinite loops, it could be a button in the Code Editor.
That's great idea, but could we figure it out automatically somehow? Do we not know when someone is "working on a cartridge" vs "playing a published cartridge"?
Don't see a normal way, even if you run a game from the Surf, dev mode is disabled, the user can go to the Code Editor, add code with inf loop, run and stuck.
the user can go to the Code Editor
Why couldn't this turn on dev mode automagically? Users just playing a game (esp a published one) shouldn't need to access the code editor at all...
Ok, a user downloads a cart from the website and runs it locally, here all is ok, dev mode disabled. A developer continues work on his WIP project runs the cart locally too without going to the Code Editor and gets stuck because dev mode is disabled.
@nesbox Do you have any thoughts/preferences on how we might actually patch the vendored software instead of purely vendoring pristine copies? Are we ok with small targeted patches to achieve specific goals? Should we just clone the repos and then point the gitsubmodules there, or would you prefer tiny patch files that we perhaps applied during the build process? Tiny patch files sound easier day to day but editing them once in a while might be a bit of a chore. (not sure if CMAKE has a provision for patches, but I'm hope so)
I was just reading the Lua source and the fastest way (at a glance) to accomplish this would be to:
- Add a tickerormaxTicksU32 to the LuaState
- Add a func_toolongcallback variable to LuaState
- We reset the ticker before we jump into lua
- Add ticker increment/check to OP_FORLOOP,OP_TFORLOOP, andOP_JUMP
- If we have no more ticks left, then bail.
JUMP is required because while and until loops are implemented with a test/jmp combo... So the expense should boil down to two or two assembly instructions (increment, test, jump)... and even then only for loop and jump opcodes. I think this is about as low impact as you can get while still doing the accounting.
I'm imagining that breaking out of stuck loops is valuable per-language... so if we added back support one at a time that would still be a good thing overall rather than not having the feature at all. I know Lua is pretty popular so it makes sense to start there first I think.
% git diff --stat                                 3.0.1
 lstate.h |  3 +++
 lvm.c    | 14 +++++++++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)
17 line patch against Lua. Works great. My system breaks out of a loop at the 50,000,000 iteration point in a barely perceptible amount of time. I don't think we need to poll for ESC... I'd rather just always crash and pick a number high enough that we're certain it's a stuck loop. Remember while we're in the loop the UI doesn't get updated either (no mouse movement, etc), so it can look like a complete freeze and the person might not even think of hitting ESC.
@vsariola What were you using to do your profiling? I can push my changes and give you the Lua patch if you'd like to test it...
Note: That's an empty loop though, so 50,000,000 is probably way too high... some actual testing could easily find a reasonable limit I'm sure.
I profiled with the Visual Studio Community 2019. I can try the patch when I'm at the computer again.
Do you have any thoughts/preferences on how we might actually patch the vendored software
We can fork the repo, apply a patch, and point the git submodule to the forked repo, as a variant.
I profiled with the Visual Studio Community 2019. I can try the patch when I'm at the computer again.
Well, it's not as simple as I thought, but you should still be able to test the performance... turns out using coroutines create an entirely new Lua state which you also have to pass the tick counter into... lucky Lua has a routine for passing state from state to state... but I've been seeing inconsistent results with my benchmark app (which mind you is a worse case since it's purposely VERY slow with lots of loops)... It's also possible I should just raise the limit. Perhaps if it locks up for 5 seconds that's better than permanently - or we could also go back to requiring the ESC key to be pressed.
I might post a patch here on a Gist later or something so you can test, but I'm going to test more before I post a PR.
https://gist.github.com/joshgoebel/7ad81dbd41e858f6daeb805ddabfee4c
@joshgoebel would you be able to turn this into a proper PR to test?
No time, sorry. But the patch should be easy enough to apply.