rq
rq copied to clipboard
UnpickleError error handling
Currently when a worker encounters an UnpickleError, the dequeued job goes into limbo until the next time the worker is restarted.
As far as I can tell, this is the workflow that causes me issues:
- A bad deployment goes out to the queue worker files, e.g. a bad import statement
- Every job dequeued by the worker fails to deserialize with an UnpickleError. This exception is not caught so job is stuck with status STARTED and is stuck in the WIP queue.
- On the next rqworker restart, all of the jobs in the WIP queue get moved to the failed queue, which causes a sudden huge spike in the failed queue.
Metrics on the default and failed queue can't catch this. I could add one on rq:wip:default, but I would prefer not to peek into the nitty-gritty implementation details. Is it possible to handle the UnpickleError by moving the job to the failed queue instead?
FWIW, I'm using rq v0.5.5, but I didn't find anything that changed regarding this since then looking through the source.
@stepzhou there's a PR that attempts to address this https://github.com/nvie/rq/pull/363.
If I remember correctly, the PR is only missing some tests and minor improvements. It would be great if someone can get it up to speed so we can merge it in.