bullmq
bullmq copied to clipboard
[Bug]: Cannot retry stalled job from Taskforce UI
Version
5.59.0
Platform
NodeJS
What happened?
We had an incident recently where many jobs stalled. After resolving issue and trying the jobs again from the dashboard, they immediately fail saying they stalled again.
Before:
Attempting Retry:
https://github.com/user-attachments/assets/b802b092-5b70-466a-97fd-11516bcbef7b
After:
How to reproduce.
It has been difficult to reproduce in a test script. When this happened, there were some existing jobs that had stalled. We use a maxStalledCount: 2.
Relevant log output
When our application was in a bad state (before application issue was resolved) and jobs were stalling, we saw these logs occur:
Error: Missing lock for job 1204749. moveToDelayed
at ScriptsPro.finishedErrors (/usr/src/app/node_modules/@taskforcesh/bullmq-pro/node_modules/bullmq/src/classes/scripts.ts:1688:16)
at ScriptsPro.finishedErrors (/usr/src/app/node_modules/@taskforcesh/bullmq-pro/src/classes/scripts-pro.ts:999:22)
at ScriptsPro.moveToDelayed (/usr/src/app/node_modules/@taskforcesh/bullmq-pro/node_modules/bullmq/src/classes/scripts.ts:1162:18)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async <anonymous> (/usr/src/app/node_modules/@taskforcesh/bullmq-pro/node_modules/bullmq/src/classes/job.ts:828:22)
at async WorkerPro.handleFailed (/usr/src/app/node_modules/@taskforcesh/bullmq-pro/node_modules/bullmq/src/classes/worker.ts:971:24)
at async <anonymous> (/usr/src/app/node_modules/@taskforcesh/bullmq-pro/node_modules/bullmq/src/classes/worker.ts:889:26)
at async WorkerPro.retryIfFailed (/usr/src/app/node_modules/@taskforcesh/bullmq-pro/node_modules/bullmq/src/classes/worker.ts:1247:16)
and
Error: could not renew lock for job 1204770
at <anonymous> (/usr/src/app/node_modules/@taskforcesh/bullmq-pro/src/classes/worker-pro.ts:424:15)
at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
at async WorkerPro.extendLocks (/usr/src/app/node_modules/@taskforcesh/bullmq-pro/src/classes/worker-pro.ts:379:5)
at async Timeout._onTimeout (/usr/src/app/node_modules/@taskforcesh/bullmq-pro/node_modules/bullmq/src/classes/worker.ts:1208:15)
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
Thanks for reporting the issue. We are looking into it.