django-q
django-q copied to clipboard
Tasks are continually retried if the function does not exist
We have a working django-q setup with the following config:
Q_CLUSTER = {
"orm": "default",
"timeout": 60,
'attempt_count': 1
"max_attempts": 2,
"retry": 120,
"catch_up": False,
"workers": 1,
"ack_failures": True
}
Some tasks were created with their func
pointing to an invalid name. When attempting to run these tasks, the cluster reports the following:
21:21:02 [Q] ERROR reincarnated pusher Process-1:48 after sudden death
21:21:02 [Q] INFO Process-1:49 pushing tasks at 117
Process Process-1:49:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/local/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.8/site-packages/django_q/cluster.py", line 354, in pusher
task = SignedPackage.loads(task[1])
File "/usr/local/lib/python3.8/site-packages/django_q/signing.py", line 25, in loads
return signing.loads(
File "/usr/local/lib/python3.8/site-packages/django_q/core_signing.py", line 49, in loads
return serializer().loads(data)
File "/usr/local/lib/python3.8/site-packages/django_q/signing.py", line 39, in loads
return pickle.loads(data)
AttributeError: Can't get attribute 'baz' on <module 'foo.bar' from '/usr/local/lib/python3.8/site-packages/foo/bar.py'>
21:21:03 [Q] ERROR reincarnated pusher Process-1:49 after sudden death
21:21:03 [Q] INFO Process-1:50 pushing tasks at 118
The cluster continually attempts to run these tasks, seemingly ignoring the max_attempts
setting. The only way I could get the tasks to stop being attempted was to remove their entries from the OrmQ
.
Is this the expected behavior irt to max_attempts
?
I don't think this is intended behavior.
Reading django_q/cluster.py
, here is how it handles exceptions in the task pusher:
try:
task = SignedPackage.loads(task[1])
except (TypeError, BadSignature) as e:
logger.error(e, traceback.format_exc())
broker.fail(ack_id)
continue
So if it gets a TypeError or a BadSignature exception, it deletes the task. (Ideally, we'd increment max_attempts rather than deleting the task from the queue as broker.fail()
does, but since we can't de-serialize the task, that's not possible.)
So, to fix this problem, I would suggest adding AttributeError and ImportError1 to the list of exceptions which are handled here.
[1]: The pickle manual page says that it can raise both AttributeError and ImportError when unpickling a function which does not exist.