Inconsistencies in job's state
kue version: 0.11.6
I'm experiencing a weird phenomena with some of our jobs. I have jobs in {q}:jobs:active ZSET that have their state set to failed.
I've tried to figure out how this is possible but I couldn't. My first suspect was that there was some external restart of the process during the job.state() function but the MUTLI is used there so it shouldn't cause any inconsistencies.
There is this queue.checkActiveJobTtl() mechanism that runs every second and in our case on some events we have a lot of these inconsistent jobs and these get processed every second which is causing an unnecessary load on our servers.
The simplest solution would be to add:
job._state = 'active';
here: https://github.com/Automattic/kue/blob/master/lib/kue.js#L245 however on one server I've noticed that we have inconsistency with jobs in the "incative" box (these are in inactive ZSET but their state is set to "failed")
Finally I know what's the problem.
So it's the refreshTtl function that is putting these jobs back to active list: https://github.com/Automattic/kue/blob/master/lib/queue/job.js#L346
This refreshTtl function is called when progress is set.
The thing is that we don't always wait for the progress callback to be called
So from time to time, a job finishes but later the progress (thus refreshTtl) runs and it adds the job back to active zset.
Unfortunately refreshTtl and Job.prototype.progress do not accept callbacks so it's impossible to fix it on our side.