asynq icon indicating copy to clipboard operation
asynq copied to clipboard

Why are we cleaning the cron entries on shutdown?

Open Kenan7 opened this issue 1 year ago • 6 comments

We have a few different types of periodic jobs running (monthly, weekly, daily).

But every time our services (pods) restart, the progress for these jobs is lost, and it has to start from the beginning,

I was researching this and came across this code, and I wanted to know how relevant is it, and why it was intended like this. Maybe it was difficult to handle resuming so it was cut off until it was fixed.

image

https://github.com/hibiken/asynq/commit/6529a1e0b1289d01f13229f450a5a0904e162a2c

In general, how can we resolve losing the progress of these entries?

Kenan7 avatar Feb 04 '24 12:02 Kenan7

clearHistory ultimately clears the enqueue event history of all scheduler entries, I don't think this should be an issue as it just prevents the db from growing uncessarily. The heartbeater does clears the scheduler entries during shutdown. I am assuming the task entries are somehow utilized within your code?

Could you describe how you expected the scheduler to work? Are these long running jobs?

kamikazechaser avatar Feb 04 '24 13:02 kamikazechaser

it just prevents the db from growing necessarily

I understand this, but maybe we should add some kind of condition on not removing the queued and unfinished jobs because we lose the progress?

I am assuming the task entries are somehow utilized within your code?

Also, what do you mean by this?


So we are leveraging this usage in asynq

image

Let's say for the daily periodic job, there needs to be a job that is handled after 55 minutes, the pod restarts, now it is beginning to calculate again for 24 hour, it is not going to be triggered after 55 minutes anymore.

The alternative is to use fixed times like everyday at 6:00 PM or something but it just does not suit the business needs.

Kenan7 avatar Feb 04 '24 13:02 Kenan7

There needs to be a job that is handled after 55 minutes, the pod restarts, now it is beginning to calculate again for 24 hour, it is not going to be triggered after 55 minutes anymore.

Thanks for clarifying. I'll investigate and try and reproduce this issue.

kamikazechaser avatar Feb 05 '24 06:02 kamikazechaser

Did you find a way to prevent deletion? @Kenan7 @kamikazechaser

Haji-sudo avatar Feb 27 '24 14:02 Haji-sudo

Unfortunately. @Haji-sudo

Kenan7 avatar Feb 27 '24 17:02 Kenan7

Same issue here, we have exactly the same use case with monthly / weekly jobs. We don't mind losing progress of ongoing jobs, as we keep track of the progress at a job level. But we would expect the job to run again on startup, as it has not been finished successfully.

This is especially visible with long-duration jobs running like so @every week or @every month. We have jobs running for hours or even days. Let's say that after 1 day the server restarts, we would have to wait for another week or month before re-running the job again.

lsgndln avatar Feb 27 '24 18:02 lsgndln