rq icon indicating copy to clipboard operation
rq copied to clipboard

UnpickleError error handling

Open stepzhou opened this issue 9 years ago • 1 comments
trafficstars

Currently when a worker encounters an UnpickleError, the dequeued job goes into limbo until the next time the worker is restarted.

As far as I can tell, this is the workflow that causes me issues:

  1. A bad deployment goes out to the queue worker files, e.g. a bad import statement
  2. Every job dequeued by the worker fails to deserialize with an UnpickleError. This exception is not caught so job is stuck with status STARTED and is stuck in the WIP queue.
  3. On the next rqworker restart, all of the jobs in the WIP queue get moved to the failed queue, which causes a sudden huge spike in the failed queue.

Metrics on the default and failed queue can't catch this. I could add one on rq:wip:default, but I would prefer not to peek into the nitty-gritty implementation details. Is it possible to handle the UnpickleError by moving the job to the failed queue instead?

FWIW, I'm using rq v0.5.5, but I didn't find anything that changed regarding this since then looking through the source.

stepzhou avatar May 18 '16 18:05 stepzhou

@stepzhou there's a PR that attempts to address this https://github.com/nvie/rq/pull/363.

If I remember correctly, the PR is only missing some tests and minor improvements. It would be great if someone can get it up to speed so we can merge it in.

selwin avatar Jun 14 '16 02:06 selwin