Retry stalled retry stalled jobs jobs :)
We have a retry_stalled_jobs job, pretty much per the docs:
@app.periodic(cron="* * * * *", queueing_lock="uptool.tasks.retry_stalled_jobs", lock="uptool.tasks.retry_stalled_jobs")
@app.task()
async def retry_stalled_jobs(timestamp: int):
stalled_jobs = await app.job_manager.get_stalled_jobs()
for job in stalled_jobs:
await app.job_manager.retry_job(job)
But of course that job can get stalled too:
And now everything is stalled.
What is the recommended workaround?
Why does the user even need to handle these scenarios manually, shouldn't procrastinate be taking care of it's internal state (including detecting stalled jobs) automatically?
Hey @jakajancar,
You indeed spotted an oversight in the documentation.
What I can suggest is to
- remove the lock for this retry stalled jobs task
- filter out this task from being retried in the list of returned stalled jobs
- mark any stalled job that corresponds to this task as failed
We need to amend the documentation.
If that doesn't work, please report back.
There is certainly value in integrating this into the library.
That might require some more thinking on what should the default behaviour be.
On the other hand, making the consumer of this library responsible for retrying stalled jobs yields the most flexibility.
Thanks @onlyann. I resolved the issue, so please just consider this a feature request, that Procrastinate should handle it's own "garbage collection".
The docs only have queueing_lock whereas I had both queueing_lock and lock. If I didn't have the latter, I think the problem wouldn't have occurred, so closing.