arq icon indicating copy to clipboard operation
arq copied to clipboard

No exception on job timeout

Open kiriusm2 opened this issue 3 years ago • 10 comments

In v0.20 when job is timed out asyncio.TimeoutError is not raised. Instead there is log message: job_name cancelled, will be run again and job is restarted immediately. Is it intentional? Having some exception on timeout and allow app to decide how to handle it seems more appropriate to me.

kiriusm2 avatar May 11 '21 20:05 kiriusm2

I think asyncio.TimeoutError is raised, it's just caught and the job is re-run.

If you want to do something different, you might be able to catch it inside the job?

samuelcolvin avatar May 11 '21 20:05 samuelcolvin

Inside the job there is asyncio.CancelledError on timeout now. I guess it's because in https://github.com/samuelcolvin/arq/pull/212/ async with async_timeout.timeout(timeout_s): was replaced with cancel_handler = self.loop.call_at(self.loop.time() + timeout_s, task.cancel)

kiriusm2 avatar May 11 '21 21:05 kiriusm2

Humm, the the logic I implemented was copied from async-timeout, so I would have expected the behaviour to be the same.

Guess I'm wrong...

samuelcolvin avatar May 12 '21 08:05 samuelcolvin

https://github.com/aio-libs/async-timeout/blob/master/async_timeout/init.py#L191-L194 async-timeout checks exception type in __exit__ and raises asyncio.TimeoutError if it's asyncio.CancelledError and task was cancelled by async-timeout. I can't find similar logic in arq

kiriusm2 avatar May 12 '21 08:05 kiriusm2

Oh, missed that. Happy to accept a PR if we can fix it.

samuelcolvin avatar May 12 '21 11:05 samuelcolvin

Is it ok to just return async-timeout back or there is some deep reason behind removing it?

kiriusm2 avatar May 12 '21 12:05 kiriusm2

yes, we need access to the task to add it to self.job_tasks

samuelcolvin avatar May 12 '21 12:05 samuelcolvin

For stuff like this, it's worth reading the code rather than asking questions, it's relatively clear in the code:

https://github.com/samuelcolvin/arq/blob/8ee04f96934684c9b8f8278df0530ae9fad1be4a/arq/worker.py#L520-L535

samuelcolvin avatar May 12 '21 12:05 samuelcolvin

Actually inside a job it's always been asyncio.CancelledError on timeout. The difference only in retries. In 0.20 job is retried and previously exception was just logged.

kiriusm2 avatar May 16 '21 09:05 kiriusm2

Humm, I think we should revert this to just log the error, you can always catch the CancelledError and raise a retry exception.

Would you create a PR?

samuelcolvin avatar May 16 '21 11:05 samuelcolvin

This seems to have been resolved.

JonasKs avatar Apr 04 '23 17:04 JonasKs