arq icon indicating copy to clipboard operation
arq copied to clipboard

Stuck jobs in queue

Open Olegt0rr opened this issue 2 years ago • 3 comments

Question: Why some tasks queued but not going to execution? New task processed by workers immediately.

Healthcheck: Jun-20 10:35:06 j_complete=6 j_failed=1 j_retried=0 j_ongoing=1 queued=10

WorkerSettings: max_jobs = 10

Workers: 4 replicas Screenshot 2022-06-20 at 14 39 26 CPU, RAM and drive space is OK.

Redis: Screenshot 2022-06-20 at 14 40 46 12 items - is OK, because 2 was added after health check.

Olegt0rr avatar Jun 20 '22 11:06 Olegt0rr

Please look at #343, you have to also check if the arq:retry or arq:in-progress have the same GUID.

JonasKs avatar Sep 15 '22 16:09 JonasKs

I increase job_timeout cause my job may work up to 3 days - it's okay

Olegt0rr avatar Sep 16 '22 01:09 Olegt0rr

Then it is because of what I wrote in #343 and this is intended. 😊 I think this issue can be closed.

JonasKs avatar Sep 16 '22 06:09 JonasKs

@JonasKs, so what should I do with long tasks? May be restart timeout and "auto-kill" timeout should be separate?

Olegt0rr avatar Sep 18 '22 00:09 Olegt0rr

The only way would to build in a health check of some sort from the worker, as suggested in the other issue.

What you should do is honestly to split your tasks into multiple steps (tasks), or to use func to set only a high timeout on your long tasks, so all other tasks can have a more sensible timeout.

JonasKs avatar Sep 18 '22 07:09 JonasKs

Good answer.

samuelcolvin avatar Sep 18 '22 08:09 samuelcolvin