bullmq
bullmq copied to clipboard
[Bug]: removeOnFail with Age Not Working
Version
4.10.0
Platform
NodeJS
What happened?
removeOnFail works as intended when supplied a boolean however, when supplied an age, the job lives indefinitely in the Queue.
How to reproduce.
import { Queue, QueueEvents, Worker } from 'bullmq';
import { assert } from 'console';
// utils
async function sleep(ms: number) {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
const DEFAULT_JOB_NAME = '__default__';
const REMOVE_ON_FAIL_IMMEDIATE = 'remove on fail immediate';
const RemoveOnFailImmediateQueue = new Queue(REMOVE_ON_FAIL_IMMEDIATE, {
connection: { host: 'localhost' },
defaultJobOptions: {
removeOnComplete: true,
removeOnFail: true,
attempts: 1,
}
});
const RemoveOnFailImmediateQueueEvents = new QueueEvents(REMOVE_ON_FAIL_IMMEDIATE, { connection: { host: 'localhost' } });
RemoveOnFailImmediateQueueEvents.on('completed', ({ jobId }) => {
console.log('done testing remove immediate');
});
RemoveOnFailImmediateQueueEvents.on(
'failed',
({ jobId }: { jobId: string; }) => {
console.log('Error in RemoveOnFailImmediateQueue');
},
);
const RemoveOnFailImmediateWorker = new Worker(REMOVE_ON_FAIL_IMMEDIATE, async job => {
throw new Error('TEST ERROR');
}, { connection: { host: 'localhost' } });
async function testRemoveOnFailImmediate() {
const jobId = 'removeImmediate'
await RemoveOnFailImmediateQueue.add(DEFAULT_JOB_NAME, {}, { jobId })
await sleep(10); // Sleep not needed. Here for posterity.
const job = await RemoveOnFailImmediateQueue.getJob(jobId);
// Manast I know you hate null ;)
const jobIsUndefined = !job;
assert(jobIsUndefined)
}
const REMOVE_ON_FAIL_WITH_AGE = 'remove on fail with age';
const RemoveOnFailWithAgeQueue = new Queue(REMOVE_ON_FAIL_WITH_AGE, {
connection: { host: 'localhost' },
defaultJobOptions: {
removeOnComplete: true,
removeOnFail: { age: 2 },
attempts: 1,
}
});
const RemoveOnFailWithAgeQueueEvents = new QueueEvents(REMOVE_ON_FAIL_WITH_AGE, { connection: { host: 'localhost' } });
RemoveOnFailWithAgeQueueEvents.on('completed', ({ jobId }) => {
console.log('done testing remove with age');
});
RemoveOnFailWithAgeQueueEvents.on(
'failed',
({ jobId }: { jobId: string; }) => {
console.log('Error in RemoveOnFailWithAgeQueue');
},
);
const RemoveOnFailWithAgeWorker = new Worker(REMOVE_ON_FAIL_WITH_AGE, async job => {
throw new Error('TEST ERROR');
}, { connection: { host: 'localhost' } });
async function testRemoveOnFailWithAge() {
const jobId = 'removeWithAge'
await RemoveOnFailWithAgeQueue.add(DEFAULT_JOB_NAME, {}, { jobId })
const seconds = 5 * 1000 // 5 seconds;
await sleep(seconds);
const job = await RemoveOnFailWithAgeQueue.getJob(jobId);
const jobIdUndefined = !job;
assert(jobIdUndefined, 'testRemoveOnFailWithAge: Job Still Exists In Queue.')
if(job){
const failed = await job.isFailed();
assert(!failed,'testRemoveOnFailWithAge: Job is marked as failed.' )
}
}
testRemoveOnFailImmediate()
.then(testRemoveOnFailWithAge)
.then(() => sleep(1000)) // Sleep added to catch logs prior to exit
.finally(() => process.exit(0))
Relevant log output
Logs from running this code:
Error in RemoveOnFailImmediateQueue
Error in RemoveOnFailWithAgeQueue
Assertion failed: testRemoveOnFailWithAge: Job Still Exists In Queue.
Assertion failed: testRemoveOnFailWithAge: Job is marked as failed.
### Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Not sure what is the difference, but we have a test that precisely test the scenario you are presenting in your issue: https://github.com/taskforcesh/bullmq/blob/master/tests/test_worker.ts#L467
Yeah. @manast
I created a blank npm project with just this test case in it to verify i wasn't going crazy. But the job is definitely stuck on the queue and when i open the redis CLI and do hgetall
on the key for the job, it's chilling there, failed and the removeOnFail setting is correct.
It's not getting removed for some reason.
I am running NodeJS 14. And my local redis version is 7.
Your code is too long and complex. Why don't you start from a working code such as the one in the test case and built from there? for sure you will find the problem then.
@manast
This is a totally fake test case I made just for simple reproduction.
I just wanted to see if I could reproduce it. My actual code is written in NestJS using the BullQueueMQ package and I was observing failed jobs stuck in the queue in production and was trying to understand why. I am running the latest version of everything bull related.
I couldn’t reproduce running your test case locally. I spent a few hours trying to understand the difference and gave up for now.
Ok, let me know when you can produce a test based on the working tests from BullMQ and we can take a deeper look into it.