bull
bull copied to clipboard
Graceful shutdown for stalled jobs ( lock renewal )
First of all sorry for muddying up the issue tracker, I'll be sure to help close two tickets to make up for this.
Active & Unlocked -> Queued I see the code for stalled jobs and time-renewed locks. I don't see the process for checking the queue for active and unlocked jobs to migrate back into inactive/queued for another job consumer to take up.
Do you know where this code is located so I can review the process exactly?
Disabling auto-lock renewal If we wanted to manually trigger the lock renewal, would that just involve setting the built-in lock renewal timer to a higher value, or would that also influence the stalled job monitor in a negative way? I'd ideally like each job's own event loop to control the renewal so if one job were to slow enormously it could lose the lock and another consumer could start over.
Lock renewal failure detection/notification There is a TODO in the lock renewal code for notifying the consumer. This one is pretty important to us because running two jobs at once could trigger API rate limiting. Is there a best practice currently to detect the lock renewal failure? Promise cancellation comes into play here, but that's outside of the scope of the question and I'm already imagining the mess of code for handling that.
here comes the answers:
Active & Unlocked -> Queued
https://github.com/OptimalBits/bull/blob/master/lib/queue.js#L559
And the timer is started here: _this.startMoveUnlockedJobsToWait();
Disabling auto-lock renewal
There is no "public" way to do this, but you can hack around it by setting the LOCK_RENEW_TIME
to infinite, and calling moveUnlockedJobsToWait
manually.
Lock renewal failure detection/notification Not sure I understand which notification you mean. There is an event 'stalled' that is emitted when a job has been detected as stalled, but I guess that is not the one you mean.
@manast my issue specifically regards the stalled job that is still running (just slowly). #308 has a similar issue, but theirs was resolved by simply not stalling meanwhile based on our legacy code quality I expect some of our jobs to stall, and I'd like to notice that and cancel execution within the job processing code.
I guess I could setup a listener on every worker monitoring for stalled jobs, but still how would I know which of the two jobs currently running is the stalled job?
Specifically either of these two locations are where I would expect my job processor to be able to know immediately when the job was unable to renew the lock so it could shut itself down. https://github.com/OptimalBits/bull/blob/d5646a069e2c73f7a1b36bcb62183ad6993d822d/lib/queue.js#L684 https://github.com/OptimalBits/bull/blob/d5646a069e2c73f7a1b36bcb62183ad6993d822d/lib/queue.js#L688
I don't see any "stalled" or "lock" parameters inside the Job object, but ideally I would need to know as soon as possible if I'm stalled and another worker has started up to try again. It looks like within the job processor I could call Job.takeLock, and check if the result value was false, null, or a caught error then exit processing of the job.
@manast #488 is enticing, but I could effectively stop my own processing with a simple notice or ability to check job.hasLock() intermittently to confirm I'm still ok to keep processing.
The reason would be I might want to gracefully shutdown e.g. delete temporary files and close database connections.
I don't necessarily need the memory segmentation and overhead of using IPC either. I like the option of running jobs as child processes though, in cases where I might be running unpredictable low quality code it would certainly give that code a better chance and isolate it's impacts.
Bump, would also love to see a graceful shutdown in the PATTERNS
section.