bullmq Add job.runAt property to expose timestamp that job is supposed to run at

Fixes: #1728

feat(job): add runAt field to job key in redis
feat(job): add runAt property to Job
test(job): extend job tests to check runAt correctness

Mar 09 '23 09:03 hyperair

Thank you for the PR. I think that it would be better to retrieve runtAt lazily, as it is difficult to keep it in sync at all times. Since runAt can be calculated from the delayed set, we can just return it when it is actually needed, for example when calling getJobs("delayed"), maybe Job could have as well an async method getRunAt() that can be called when needed.

Mar 09 '23 11:03 manast

@manast While that works for jobs that are in the delayed state, it doesn't work for calculating something like a "job wait time" metric, i.e. how long after after a job's runAt that it actually starts running, because there's no timestamp for when a job was enqueued into waiting, and once the job is removed from the delayed zset, there's no score to look at either.

I don't think storing the job's runAt on the job key and keeping it in sync is any worse than the way delay is stored on the job key today, and I think the way this PR does it, it's a more useful property and stays in sync better than job.delay does.

Mar 10 '23 02:03 hyperair

@hyperair could we delete the "delay" field on the job then as it does not seem to be useful? Regarding runAt, I think it could be misleading as it should not represent "when" the job will be run, it represents how much the job will wait at least before it is run, or the timestamp from which it could run, but for different reasons it could run after that, so "processedOn" and "runAt" would not be the same most of the time.

Mar 16 '23 13:03 manast

@hyperair then the other question is, should "runAt" be cleared after the job has been moved from the delayed set to the wait list?

Mar 16 '23 13:03 manast

so in a sense "runAt" would be the same as the score in the delay set, just a cached value that makes it more convenient when getting the jobs.

Mar 16 '23 13:03 manast

@manast While that works for jobs that are in the delayed state, it doesn't work for calculating something like a "job wait time" metric, i.e. how long after after a job's runAt that it actually starts running, because there's no timestamp for when a job was enqueued into waiting, and once the job is removed from the delayed zset, there's no score to look at either.

Ok, this seems to be quite a niche use-case, since usually you are mostly interested in the total time it took from when the job was added to a queue up to when it was completed 🤔

Mar 16 '23 14:03 manast

@manast

could we delete the "delay" field on the job then as it does not seem to be useful?

Yep, it'll be an API break, but if you're okay with it I'll drop the delay field.

Regarding runAt, I think it could be misleading as it should not represent "when" the job will be run, it represents how much the job will wait at least before it is run, or the timestamp from which it could run, but for different reasons it could run after that, so "processedOn" and "runAt" would not be the same most of the time.

Hmm, that's true. Will one of the following work, or do you have suggestions for a better name?

delayedUntil -- might not be good either since it implies that the job is delayed, which may not be the case for a job that was added without delay
notBefore
scheduledFor

then the other question is, should "runAt" be cleared after the job has been moved from the delayed set to the wait list? so in a sense "runAt" would be the same as the score in the delay set, just a cached value that makes it more convenient when getting the jobs.

Nope, the idea is for runAt (or alternative name) to be the timestamp from which a job should be available for execution

For delayed -> waiting -> active, it'll be the same as the delayedTimestamp encoded in the score of the job
For job.changeDelay or job.moveToDelayed, same as above
For waiting -> active, i.e. jobs scheduled without delay, it should be the same as job.timestamp
For (completed|failed) -> waiting -> active, it should be the timestamp at which job.retry() was called

Ok, this seems to be quite a niche use-case, since usually you are mostly interested in the total time it took from when the job was added to a queue up to when it was completed :thinking:

In my case I'm generating two timing metrics for this:

job_wait_time -- the time a job spent waiting in queue, excluding the time before runAt (useful for troubleshooting queue backlog, e.g. insufficient workers to process the jobs, or jobs not getting promoted in a timely fashion..)
job_execution_time -- the time a job spent actually executing the job (useful for troubleshooting other kinds of trouble, e.g. slow API calls, other kinds of bugs)

A single combined metric summing the above two durations up makes it more difficult to differentiate between these two kinds of problems, but even that's not really possible to generate with bullmq master right now.

Mar 17 '23 02:03 hyperair

could we delete the "delay" field on the job then as it does not seem to be useful?

Yep, it'll be an API break, but if you're okay with it I'll drop the delay field.

As it turns out, there's one purpose for delay that runAt doesn't cover -- updateParentDepsIfNeeded uses it to schedule the parent into delayed state instead of waiting

Mar 17 '23 07:03 hyperair

This would be a very useful addition for me as I cannot accurately tell when a job will after calling job.changeDelay

Feb 16 '24 21:02 Riley-Brown

Would also love this feature since I use changeDelay all the time. Using bull board and its quite bad to not know when a job will be executed.

Jul 15 '24 20:07 lindesvard

@lindesvard I will try to improve it somehow.

Jul 16 '24 09:07 manast

bullmq bullmq copied to clipboard

Add job.runAt property to expose timestamp that job is supposed to run at

bullmq
bullmq copied to clipboard