node-supervisor icon indicating copy to clipboard operation
node-supervisor copied to clipboard

Exponential backoff for multiple restarts

Open iangreenleaf opened this issue 11 years ago • 4 comments

Idea is to not get into a horrible infinite restart loop when you have a syntax error or similar. First restart happens immediately. Second restart within a certain time window happens with a short delay. Each subsequent restart waits a little longer, up to a maximum delay of some sort (1000ms?).

iangreenleaf avatar Feb 17 '14 07:02 iangreenleaf

Is this being worked on? This has been a huge annoyance in OpenShift. Lots of folks leave broken apps in infinite restart loops, chewing up CPU, and we waste more CPU finding and stopping them. I would want to see the back-off strategy be tunable from day one, and as mentioned in https://github.com/isaacs/node-supervisor/issues/53 , maybe it would be nice to wait for file changes after a certain number of failures.

It's also possible that you just want to take the elapsed time of the child process into consideration. This would have to be tunable, but if I see an app that crashes after running less than a minute, I can be pretty sure I want to back off a lot. and if that happens several times in a row, I want to give up and wait for file changes.

I'd be happy to write up something if there's interest, but I don't want to duplicate effort here.

a13m avatar May 28 '14 01:05 a13m

This is not currently being worked on (to my knowledge). I'd be very receptive to a pull request!

I've been thinking an exponential backoff alone would be good enough, because by the time you hit the maximum backoff (maybe 30s), it doesn't make much of a resources impact if it keeps restarting endlessly at 30s intervals.

Looking at the elapsed time would be great. I've been thinking it would be ideal to track the number of crashes over some window of time (maybe the past 5 minutes). That way, if this is the first crash, it only waits maybe 5ms, but if it has crashed a ton in that window, we start backing it off until we reach the max interval.

iangreenleaf avatar May 28 '14 05:05 iangreenleaf

This is definitely a +1 from me. When developing on a macbook a single endlessy respawning process is enough to make the fans spin up.

This could be considered a feature since you'll notice that something is awry :-) But it'd be great to have the option to make supervisor back off a bit when respawning too quickly.

mengstr avatar Jun 06 '14 07:06 mengstr

+1

Gr8z avatar Oct 03 '19 21:10 Gr8z