procrastinate
procrastinate copied to clipboard
Retry Failed Job
This is an action intended to be manually invoked that put the failed job's state back to todo, while keeping the events history.
Only integrated within django (for now) It's an Early submission (draft) for the purpose of getting feedback the maintainers. I would like to get confirmation that's something they'd like to consider, assuming I'll push it through with tests and updated documentation. only Tested manually on my machine as it is.
Closes N/A
Successful PR Checklist:
- [x] Tests
- [ ] (not applicable?)
- [x] Documentation
- [ ] (not applicable?)
PR label(s):
- [ ] https://github.com/procrastinate-org/procrastinate/labels/PR%20type%3A%20breaking%20%F0%9F%92%A5
- [x] https://github.com/procrastinate-org/procrastinate/labels/PR%20type%3A%20feature%20%E2%AD%90%EF%B8%8F
- [ ] https://github.com/procrastinate-org/procrastinate/labels/PR%20type%3A%20bugfix%20%F0%9F%95%B5%EF%B8%8F
- [ ] https://github.com/procrastinate-org/procrastinate/labels/PR%20type%3A%20miscellaneous%20%F0%9F%91%BE
- [ ] https://github.com/procrastinate-org/procrastinate/labels/PR%20type%3A%20dependencies%20%F0%9F%A4%96
- [x] https://github.com/procrastinate-org/procrastinate/labels/PR%20type%3A%20documentation%20%F0%9F%93%9A
Hello!
I like the idea and large parts of the work already done.
I'm not sure I think a retried status is the best though. We had issues when we introduced a aborting state, which we had to undo. I think a job can be both retried AND todo, or doing, or suceeded or failed. If we want to capture that, it would rather be, I think, an event in the events table (capturing the fact a same job might be retried multiple times).
I like the idea of Django admin actions. but this immediately bears the idea of also aborting a task through the admin, though that might be for a later PR.
Hi, thanks for your feedback. I'm not sure I completely understand your recommendations though :)
I'm not sure I think a retried status is the best though.
I agree I haven't introduced one.
If we want to capture that, it would rather be, I think, an event in the events table
Exactly, we are in agreement. and the PR does that already ...
Am I hallucinating or you reviewed it a bit too quickly :)
Nevertheless, I will add some tests it will make feature clearer on how it's supposed to work, and maybe it will reveal my hallucinations. Stay tuned
In addition to django admin, do you see a need to expose this feature on the cli ? Where else could it be useful ?
Hm, maybe I have, I'll take some time time and reread through, apologies.
Coverage report
This PR does not seem to contain any modification to coverable code.
First: sorry it took me so long to dive into this.
No need, I assume we are all volunteers here.
This seems, nice, I'm wondering if you can share a bit on what led you to create a new function
procrastinate_retry_failed_job_v1rather than adaptingprocrastinate_retry_job_v1. I'm not saying it's not a good choice, it might be perfectly fine, but I feel I'm missing a bit of the reasons (might be "to avoid messing with existing code paths" ?)
Because the signature is different, This version don't accept retry_at (as explained in my other comment).
No other reasons I can remember of. I'll gladly change the code and follow your guidance if you prefer an alternative.
All in all, this looks great :) I'd love an opinion from @medihack
thanks :)
This seems, nice, I'm wondering if you can share a bit on what led you to create a new function
procrastinate_retry_failed_job_v1rather than adaptingprocrastinate_retry_job_v1. I'm not saying it's not a good choice, it might be perfectly fine, but I feel I'm missing a bit of the reasons (might be "to avoid messing with existing code paths" ?)
I also find it a little unusual to have two retry functions now, but I can certainly adapt to it and understand the reasoning behind it. Just a counterproposal... how about something like this:
CREATE FUNCTION procrastinate_retry_job_v2(
job_id bigint,
retry_at timestamp with time zone,
new_priority integer,
new_queue_name character varying,
new_lock character varying
) RETURNS void LANGUAGE plpgsql AS $$
DECLARE
_job_id bigint;
_abort_requested boolean;
_current_status procrastinate_job_status;
BEGIN
SELECT status, abort_requested FROM procrastinate_jobs
WHERE id = job_id AND status IN ('doing', 'failed')
FOR UPDATE
INTO _current_status, _abort_requested;
IF _current_status = 'doing' AND _abort_requested THEN
UPDATE procrastinate_jobs
SET status = 'failed'::procrastinate_job_status
WHERE id = job_id AND status = 'doing'
RETURNING id INTO _job_id;
ELSE
UPDATE procrastinate_jobs
SET status = 'todo'::procrastinate_job_status,
attempts = attempts + 1,
scheduled_at = retry_at,
priority = COALESCE(new_priority, priority),
queue_name = COALESCE(new_queue_name, queue_name),
lock = COALESCE(new_lock, lock)
WHERE id = job_id AND status IN ('doing', 'failed')
RETURNING id INTO _job_id;
END IF;
IF _job_id IS NULL THEN
RAISE 'Job was not found or has an invalid status to retry (job id: %)', job_id;
END IF;
END;
$$;
The retry_at could be set to the present time or we can make all parameters fully optional by using DEFAULT NULL and adding scheduled_at = COALESCE(retry_at, scheduled_at).
Additionally, as already suggested, it would be beneficial to have a way to execute it using the CLI (although we can also add this feature using a second PR later).
That all said, it's absolutely a nice feature 🙂.
Thanks, I just pushed changes reflecting your suggestion.
Now we are using the same retry procedure for TODO and FAILED jobs.
Django integration, defaults on retry_at=now(), for sake of simplicity.
I don't understand the failure related to versioning of migrations. I tried multiple version schemes. none of them worked.
I don't understand the failure related to versioning of migrations. I tried multiple version schemes. none of them worked.
This system is quite new so it's not impossible that I forgot a couple of things in it. I'll take it from there :)
When I can't rebase my mistakes away (because mainatainers can't rebase on a fork PR), it's much harder to do 🦄✨magic🪄🎩
Thank you for your patience !