arq
arq copied to clipboard
Feature Request: Adjust in_progress_key timeout and implement additional heartbeat functionality
Related to issue #402, we have some long-running tasks that may last for hours. Currently, if a worker encounters a failure, the task is only retried after the in_progress_key
expires, which is based on the max_timeout
- potentially many hours.
https://github.com/samuelcolvin/arq/blob/9109c2e59d2b13fa59d246da03d19d7844a6fa19/arq/worker.py#L264
A huge enhancement would be to lower the default self.in_progress_timeout_s
to a lower value, like 10 seconds. The worker could then periodically update the in_progress_key
expirations on every heartbeat, increasing it by a few seconds each time. This could ensure that jobs are retried promptly if a worker fails, rather than waiting for a long timeout.
This would be incredibly helpful for handling worker failures on long-running tasks.
I agree. PR welcome, but we’d have to solve #405 first.