arq icon indicating copy to clipboard operation
arq copied to clipboard

Feature Request: Adjust in_progress_key timeout and implement additional heartbeat functionality

Open noncuro opened this issue 1 year ago • 1 comments

Related to issue #402, we have some long-running tasks that may last for hours. Currently, if a worker encounters a failure, the task is only retried after the in_progress_key expires, which is based on the max_timeout - potentially many hours.

https://github.com/samuelcolvin/arq/blob/9109c2e59d2b13fa59d246da03d19d7844a6fa19/arq/worker.py#L264

A huge enhancement would be to lower the default self.in_progress_timeout_s to a lower value, like 10 seconds. The worker could then periodically update the in_progress_key expirations on every heartbeat, increasing it by a few seconds each time. This could ensure that jobs are retried promptly if a worker fails, rather than waiting for a long timeout.

This would be incredibly helpful for handling worker failures on long-running tasks.

noncuro avatar Jun 09 '23 18:06 noncuro

I agree. PR welcome, but we’d have to solve #405 first.

JonasKs avatar Jun 09 '23 19:06 JonasKs