bullmq icon indicating copy to clipboard operation
bullmq copied to clipboard

[Feature request] Queue age (age of oldest job) metric

Open aseemk opened this issue 8 months ago • 4 comments

Is your feature request related to a problem? Please describe.

We monitor the length of our BullMQ queues using getJobCounts(). That's helpful and sufficient for most of our queues where we enqueue work continuously or at a low volume.

However, for bulk processing use cases, we may enqueue many jobs all at once (e.g. 1M jobs overnight each night), and for those, monitoring queue length isn't right, since we expect backups every time we first enqueue. We aim to process those jobs and clear the queue within some time goal (SLA/SLO), though — so we'd really like to measure and monitor the age/staleness of the queue.

This is also how we'd prefer to implement worker autoscaling.

Describe the solution you'd like

SQS and other queues have a metric for this. SQS calls its version ApproximateAgeOfOldestMessage.

Could BullMQ implement and provide a metric like that too?

Describe alternatives you've considered

Not sure to be honest. Would love to be able to measure this ourselves somehow, but it's not obvious to me how!

Additional context

Thank you for the library and consideration!

aseemk avatar Apr 22 '25 05:04 aseemk

Thanks for this FR. If I understand you correctly, this should be pretty easy to implement, at least for standard jobs, as we would just need to pick the oldest element in the wait list and return its timestamp (or Date.now() - timestamp anyway). That would be an O(1) operation as well so pretty fast.

manast avatar Apr 22 '25 06:04 manast

If it's easy to query this by type/status (e.g. waiting vs delayed etc.), just like getJobCounts(), that'd be even slicker!

aseemk avatar Apr 23 '25 16:04 aseemk

You can already do this using getJobs: https://api.docs.bullmq.io/classes/v5.QueueGetters.html#getJobs.getJobs-1

Just choose one job and use the asc argument appropriately so you will get the oldest in every case. For delayed jobs you will not get the oldest and this will not be achievable with a new api either but for scaling purposes not sure it will be so valuable anyway.

manast avatar Apr 23 '25 17:04 manast

Thanks! This worked great. If helpful to others, here's the code I wrote:

const [oldestJob] = await bullQueue.getJobs(
  // Exclude `active`, `completed`, `failed`, and `delayed` (scheduled for the future) jobs.
  ['wait', 'waiting', 'waiting-children', 'prioritized', 'paused'],
  0, // start index, inclusive
  0, // end index, inclusive
  true, // ascending order (oldest first)
);

Metrics.gauge(
  `queue.oldest_age_secs`,
  oldestJob
    ? getSecondsBetween(new Date(oldestJob.timestamp), new Date())
    : 0,
  tags,
);

Feel free to close or keep open if you think might still be nice to pull into BullMQ itself. Thanks again!

aseemk avatar May 05 '25 08:05 aseemk