TIC-80 icon indicating copy to clipboard operation
TIC-80 copied to clipboard

LUA sethook slows things down

Open vsariola opened this issue 4 years ago • 19 comments

The LUA debug hook mechanism is used to prevent infinite loops / allowing exiting LUA script. I'm talking about the hook here:

https://github.com/nesbox/TIC-80/blob/60e63de918ae03bc9223097cc1369459d94cbb99/src/api/lua.c#L1382

The count is set pretty high, so that's fine. Unfortunately, if the count hook is enabled, LUA still calls luaG_traceexec every instruction. luaG_traceexec just decreases the current count and returns if it has not reached 0 yet. Even though this is relatively cheap, it is still very much in the hottest path of the VM, so the luaG_traceexec calls actually costs some 10-20% of the CPU time (!) consumed TIC-80 when doing heavy stuff in LUA, according to profiling.

I totally understand why the sethook is needed (to prevent accidental data loss), but slowing down TIC-80 so much just for this seems a bit excessive.

There's a few options:

  1. Keep it as it is. Users can prevent the hook by invoking debug.sethook() themselves, to gain the extra speed. The problem with this approach is that how many will learn / know this arcane spell?
  2. Add the hook only when the cart is run from the studio. Presumably, that's when there's development going on; if someone is playing a game, a crash is not any worse than exiting to console as they lose their game progress both ways.
  3. Add the hook only when there are unsaved changes to the cart.
  4. Find another mechanism to prevent infinite loops e.g. another thread that can kill the LUA if it takes too long.

vsariola avatar Oct 11 '21 15:10 vsariola

Is there a reason this is so expensive? Can it be inlined? Reducing a counter each tick should not be an expensive operation.

joshgoebel avatar Oct 17 '21 20:10 joshgoebel

I can try to profile more and/or try compile options, but bear in mind that most VM instructions ain't doing too much beyond adding, subtracting etc. a few numbers. So, the small counter decrement and possible branch might still add up.

vsariola avatar Oct 17 '21 20:10 vsariola

That's why I asked if it's inlined. There should be no need for any branching or function calls. It should - you would think - just increment inside the main execution loop.

joshgoebel avatar Oct 17 '21 21:10 joshgoebel

I cannot get Microsoft compiler to inline it; presumably because luaG_traceexec is too long overall:

https://github.com/lua/lua/blob/75ea9ccbea7c4886f30da147fb67b693b2624c26/ldebug.c#L667

The lines that get run every VM instruction are:

void luaG_traceexec (lua_State *L) {
  CallInfo *ci = L->ci;
  lu_byte mask = L->hookmask;
  int counthook = (--L->hookcount == 0 && (mask & LUA_MASKCOUNT));
  if (counthook)
    resethookcount(L);  /* reset count */
  else if (!(mask & LUA_MASKLINE))
    return;  /* no line hook and count != 0; nothing to be done */
...

Those become ~ 20 assembly instructions (movs, push, sub, and, xor etc.). After that, there's quite a bit of code that run the hook if needed, which is why I don't think the compiler likes to inline it, even when compiling with "inline anything suitable".

LUA can be forked and luaG_traceexec broken into two parts, to try to get compiler inline the top part. That's going to save some of the instructions (call / ret, some shadow space setup on windows), but it's a lot of work for something that still probably leaves some overhead.

vsariola avatar Oct 18 '21 11:10 vsariola

I removed the infinite loop checking here 1a4e61a and it works significantly faster now, but we have to implement a new infinite loop checking mechanism...

nesbox avatar Nov 01 '21 11:11 nesbox

Perhaps we should have just placed it all behind the CHECK_FORCE_EXIT compile flag, as was already done in a few places? Did we confirm the same measurable slowdown with the other engines that we're seeing with Lua?

joshgoebel avatar Nov 05 '21 05:11 joshgoebel

Or we can add a special dev mode where we will check the infinite loops, it could be a button in the Code Editor. By default, all carts run without infinite loops checking. I see it like this.

nesbox avatar Nov 05 '21 11:11 nesbox

Or we can add a special dev mode where we will check the infinite loops, it could be a button in the Code Editor.

That's great idea, but could we figure it out automatically somehow? Do we not know when someone is "working on a cartridge" vs "playing a published cartridge"?

joshgoebel avatar Nov 05 '21 12:11 joshgoebel

Don't see a normal way, even if you run a game from the Surf, dev mode is disabled, the user can go to the Code Editor, add code with inf loop, run and stuck.

nesbox avatar Nov 05 '21 12:11 nesbox

the user can go to the Code Editor

Why couldn't this turn on dev mode automagically? Users just playing a game (esp a published one) shouldn't need to access the code editor at all...

joshgoebel avatar Nov 05 '21 12:11 joshgoebel

Ok, a user downloads a cart from the website and runs it locally, here all is ok, dev mode disabled. A developer continues work on his WIP project runs the cart locally too without going to the Code Editor and gets stuck because dev mode is disabled.

nesbox avatar Nov 05 '21 12:11 nesbox

@nesbox Do you have any thoughts/preferences on how we might actually patch the vendored software instead of purely vendoring pristine copies? Are we ok with small targeted patches to achieve specific goals? Should we just clone the repos and then point the gitsubmodules there, or would you prefer tiny patch files that we perhaps applied during the build process? Tiny patch files sound easier day to day but editing them once in a while might be a bit of a chore. (not sure if CMAKE has a provision for patches, but I'm hope so)

I was just reading the Lua source and the fastest way (at a glance) to accomplish this would be to:

  • Add a ticker or maxTicks U32 to the LuaState
  • Add a func_toolong callback variable to LuaState
  • We reset the ticker before we jump into lua
  • Add ticker increment/check to OP_FORLOOP, OP_TFORLOOP, and OP_JUMP
  • If we have no more ticks left, then bail.

JUMP is required because while and until loops are implemented with a test/jmp combo... So the expense should boil down to two or two assembly instructions (increment, test, jump)... and even then only for loop and jump opcodes. I think this is about as low impact as you can get while still doing the accounting.

I'm imagining that breaking out of stuck loops is valuable per-language... so if we added back support one at a time that would still be a good thing overall rather than not having the feature at all. I know Lua is pretty popular so it makes sense to start there first I think.

joshgoebel avatar Jan 07 '22 20:01 joshgoebel

% git diff --stat                                 3.0.1
 lstate.h |  3 +++
 lvm.c    | 14 +++++++++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

17 line patch against Lua. Works great. My system breaks out of a loop at the 50,000,000 iteration point in a barely perceptible amount of time. I don't think we need to poll for ESC... I'd rather just always crash and pick a number high enough that we're certain it's a stuck loop. Remember while we're in the loop the UI doesn't get updated either (no mouse movement, etc), so it can look like a complete freeze and the person might not even think of hitting ESC.

@vsariola What were you using to do your profiling? I can push my changes and give you the Lua patch if you'd like to test it...

Note: That's an empty loop though, so 50,000,000 is probably way too high... some actual testing could easily find a reasonable limit I'm sure.

joshgoebel avatar Jan 07 '22 21:01 joshgoebel

I profiled with the Visual Studio Community 2019. I can try the patch when I'm at the computer again.

vsariola avatar Jan 08 '22 07:01 vsariola

Do you have any thoughts/preferences on how we might actually patch the vendored software

We can fork the repo, apply a patch, and point the git submodule to the forked repo, as a variant.

nesbox avatar Jan 08 '22 09:01 nesbox

I profiled with the Visual Studio Community 2019. I can try the patch when I'm at the computer again.

Well, it's not as simple as I thought, but you should still be able to test the performance... turns out using coroutines create an entirely new Lua state which you also have to pass the tick counter into... lucky Lua has a routine for passing state from state to state... but I've been seeing inconsistent results with my benchmark app (which mind you is a worse case since it's purposely VERY slow with lots of loops)... It's also possible I should just raise the limit. Perhaps if it locks up for 5 seconds that's better than permanently - or we could also go back to requiring the ESC key to be pressed.

I might post a patch here on a Gist later or something so you can test, but I'm going to test more before I post a PR.

joshgoebel avatar Jan 08 '22 16:01 joshgoebel

https://gist.github.com/joshgoebel/7ad81dbd41e858f6daeb805ddabfee4c

joshgoebel avatar Jan 08 '22 16:01 joshgoebel

@joshgoebel would you be able to turn this into a proper PR to test?

aliceisjustplaying avatar May 02 '24 16:05 aliceisjustplaying

No time, sorry. But the patch should be easy enough to apply.

joshgoebel avatar May 03 '24 15:05 joshgoebel