TIC-80 icon indicating copy to clipboard operation
TIC-80 copied to clipboard

Prevent infinite loops from crashing the entire system

Open BrunoDSL opened this issue 3 years ago • 13 comments

So far, I've experienced this issue twice today on both Android and a Pro build compiled from the repo in Manjaro. I'm not sure exactly what caused both, since the code from both was lost, but certain cases of while loop usage (most likely improper ones) can hard lock TIC-80 to the point you have to force quit it. Perhaps the machine runs out of memory after being inundated with calls?

Maybe TIC-80 should have some form of protection against it?

BrunoDSL avatar Apr 11 '21 05:04 BrunoDSL

A reproducible example would be great. Does while(true) {} have the problem?

joshgoebel avatar Apr 11 '21 05:04 joshgoebel

Here's a cart with a JavaScript while(true){}. jsdemo.tic.zip

I'm not one for baby-sitting bad code :shrug:

RobLoach avatar Apr 11 '21 06:04 RobLoach

I could have whipped up a test myself if I wanted. Does it hard lock? :-)

I'm not one for baby-sitting bad code 🤷

I generally agree, but crashing entirely is still pretty bad... I'm not sure how you would easily catch something like this though unless the runtime itself had something nice built in. I suppose you could periodically check how much time has elapsed and if you're WAY over the amount of time to render a frame just do a crash and output a stack trace. Like frames are supposed to take 16ms (tops) for 60FPS... but if your taking 0.5 seconds, your doing something VERY wrong...

joshgoebel avatar Apr 11 '21 07:04 joshgoebel

I haven't looked at the code but I'd assume/hope all the runtimes have some type of "run x cycles" provision vs "run forever and cross your fingers"? maybe it's finally time for a reasonable CPU bound. :)

joshgoebel avatar Apr 11 '21 07:04 joshgoebel

Here's a cart with a JavaScript while(true){}. jsdemo.tic.zip

I'm not one for baby-sitting bad code 🤷

But then... Is the machine hard locking because your code is bad? A bug? Some quirk with how stuff is implemented? Unless you save your work beforehand (which you often never do when you're just whipping a quick idea out) or have some form of debugging log or similar, you can't even provide examples for bug reports unless you remember the code.

BrunoDSL avatar Apr 11 '21 12:04 BrunoDSL

As I remember only Lua (and Lua-based scripts) has infinite loop protection, try to hold the ESC button to exit from the loop. Other scripts don't handle this at the moment.

nesbox avatar Apr 12 '21 10:04 nesbox

@nesbox Do the runtimes not offer a stepping function to allow you to drive/control the VM? Surely they do.

joshgoebel avatar Apr 12 '21 10:04 joshgoebel

Yes, they offer but we use it in Lua only https://github.com/nesbox/TIC-80/blob/master/src/api/lua.c#L1270

nesbox avatar Apr 12 '21 10:04 nesbox

I might play around with this later... it'd be nice if it just safely crashed (with a console error) if it looked like it was "truly stuck"... what is one of the most CPU hungry carts we have so I have something to test against to see what reasonable limits might be?

joshgoebel avatar Apr 12 '21 11:04 joshgoebel

It's odd because several languages compile down to lua but don't get the same protection

ChildishGiant avatar Aug 31 '21 10:08 ChildishGiant

So @lenaschimmel recently encountered this problem in the web version, where an endless loop locked the console, and she lost all work she'd done so far. :/ This is a terrible first-time experience.

When I just tried it, holding ESC to exit a loop didn't seem to work both in the web version as well in the Linux version.

Other than implementing a way to quit a loop like that, how could we prevent work from being lost? One proposal: Save a temporary copy of the current cart each time before running it, and add a way to restore it later. I'm not entirely sure what the resume command does right now, but I'd expect it to resume non-saved, crashed carts, as well, I think. If people think this would be good, we could open a new issue. :)

blinry avatar Apr 10 '22 10:04 blinry

I just tried it with version 0.90.1724 Pro (8e9dfe2) on MacOS and can confirm that while(true) hard locks, holding ESC does not help and the TIC window cannot even be closed by clicking on the close button in the window title bar.

Specific protection against infinite loops / recursions or very long computations are probably nice to have, but I would argue that those are a secondary concern, compared to general data-loss protection. Looking through older issues, it seems that this specific problem has been reported, fixed, and then broken again before. Also, there's an inifinte number of other ways that users could write bad code (or even good code with a small, but consequential bug in it) that either result in infinite computation, or might break something else. I don't know if TIC-80 is, in theory, safeguarded agains rouge pokes into the memory that mess up the integrated IDE, etc.

I've read the autosave discussion #221, and to me it seems to be about game playing state (i.e., how many levels has the player solved) in constrast to game development state: (i.e. what is the code / the sprite data / map data of the game). With TIC-80 being somewhant self-contained, I don't know if those are two completely different problems, or somewhat overlapping or even identical. If code was auto-saved / auto-backuped, it would protect against all those known and unknown error conditions.

To anyone who says "save your code, or deal with it", I want to say that TIC-80 is a rather rare combination of three aspects:

  • code can be edited within the system that executes it
  • code can be run without saving it first
  • code execution can crash the system

This is a terrible first-time experience.

That's true. I almost gave up on TIC-80 after this happened to me. I'm glad that I didn't, and I finished my first TIC-80 game and I'm looking foreward to doing more of them. But yeah, I can totally imagine how this could put off first time users.

lenaschimmel avatar Apr 10 '22 11:04 lenaschimmel

I agree both items mentioned here are important but I think preserving the state of the editor in the case of a crash is probably the more important item since who knows why the system might crash and that would cover many other cases as well.

In other words, while you're working the editor should always be treated as if it were a temporary file that's saved on your system somewhere. If someone restarts after a crash the previous temporary file should be restored. That should really probably be opened as a separate issue though.

joshgoebel avatar Apr 10 '22 11:04 joshgoebel

Hi! I'm here just to say that I lost many hours worth of work because of a code similar to this one (that I'm recalling from memory):

repeat
  k = math.ceil(math.random()*#units))
until units[k].hp > 0

when all units were dead.

From now on, I will try to remember to save the file more often, but I think its very harsh to lose all your work because of an error that easily can happen in game design and can be very obscure, specially if you are just learning.

I'm using 1.0.2243-dev on Arch Linux. ESC key did not help.

I agree that if current working file is recoverable, having to kill the process is not so bad and it may be even educational.

Sorry for the catharsis. I will now start gathering the will to redo the lost pieces.

franalbani avatar Oct 15 '22 01:10 franalbani

hi, i opened a duplicated issue... i am not an expert but have some ideas like...

  • designate a specific byte in ram memory address to put a flag (either "emulated" ram or "hypervisor" ram)
  • spawn another thread that only checks for esc key pressed, and set this byte when found pressed
  • internally modify every while, for, goto, repeat, or any potential loop or jump instruction to also check this flag, to exit program if set... similarly as exit in non stuck conditions (could call same internal function)

could also modify internal loop/jump functions to check for esc key themselves directly, avoiding thus to spawn another thread and reserve a byte to do this job

atesin avatar Apr 12 '23 08:04 atesin