procrastinate icon indicating copy to clipboard operation
procrastinate copied to clipboard

Retry stalled retry stalled jobs jobs :)

Open jakajancar opened this issue 10 months ago • 2 comments

We have a retry_stalled_jobs job, pretty much per the docs:

@app.periodic(cron="* * * * *", queueing_lock="uptool.tasks.retry_stalled_jobs", lock="uptool.tasks.retry_stalled_jobs")
@app.task()
async def retry_stalled_jobs(timestamp: int):
    stalled_jobs = await app.job_manager.get_stalled_jobs()
    for job in stalled_jobs:
        await app.job_manager.retry_job(job)

But of course that job can get stalled too:

Image

And now everything is stalled.

What is the recommended workaround?

Why does the user even need to handle these scenarios manually, shouldn't procrastinate be taking care of it's internal state (including detecting stalled jobs) automatically?

jakajancar avatar Jun 02 '25 21:06 jakajancar

Hey @jakajancar,

You indeed spotted an oversight in the documentation.

What I can suggest is to

  • remove the lock for this retry stalled jobs task
  • filter out this task from being retried in the list of returned stalled jobs
  • mark any stalled job that corresponds to this task as failed

We need to amend the documentation.

If that doesn't work, please report back.

There is certainly value in integrating this into the library.

That might require some more thinking on what should the default behaviour be.

On the other hand, making the consumer of this library responsible for retrying stalled jobs yields the most flexibility.

onlyann avatar Jun 03 '25 09:06 onlyann

Thanks @onlyann. I resolved the issue, so please just consider this a feature request, that Procrastinate should handle it's own "garbage collection".

jakajancar avatar Jun 03 '25 19:06 jakajancar

The docs only have queueing_lock whereas I had both queueing_lock and lock. If I didn't have the latter, I think the problem wouldn't have occurred, so closing.

jakajancar avatar Jul 25 '25 22:07 jakajancar