django-q icon indicating copy to clipboard operation
django-q copied to clipboard

The worker won't trying to retry any tasks

Open galihlprakoso opened this issue 3 years ago • 4 comments

Q_CLUSTER = {
    'name': 'test',
    'workers': 3,
    'retry': 5,
    'recycle': 500,
    'timeout': 4,
    'ack_failures': True,
    "max_attempts": 5,
    'compress': True,
    'save_limit': 250,
    'queue_limit': 500,
    'cpu_affinity': 1,
    'label': 'Test',
    'redis': {
        'host': '127.0.0.1',
        'port': 6379,
        'password': '',
        'db': 0, }
}

With this given configuration. the worker simply won't retry failed tasks or timed out task.

This was the sample log for timed out task:

00:29:32 [Q] INFO Q Cluster florida-william-twenty-monkey starting.
00:29:33 [Q] INFO Process-1 guarding cluster florida-william-twenty-monkey
00:29:33 [Q] INFO Q Cluster florida-william-twenty-monkey running.
00:29:33 [Q] INFO Process-1:4 monitoring at 9676
00:29:33 [Q] INFO Process-1:5 pushing tasks at 9677
00:29:33 [Q] INFO Process-1:3 ready for work at 9675
00:29:33 [Q] INFO Process-1:1 ready for work at 9673
00:29:33 [Q] INFO Process-1:2 ready for work at 9674
00:29:35 [Q] INFO Process-1:3 processing [april-beer-music-wisconsin]
TEST
00:29:40 [Q] WARNING reincarnated worker Process-1:3 after timeout
00:29:40 [Q] INFO Process-1:6 ready for work at 9680

This was the sample log for failed task:

00:33:49 [Q] INFO Process-1:2 ready for work at 9763
00:33:49 [Q] INFO Process-1:5 pushing tasks at 9766
00:33:49 [Q] INFO Process-1:4 monitoring at 9765
00:33:49 [Q] INFO Process-1:3 ready for work at 9764
00:33:51 [Q] INFO Process-1:1 processing [missouri-oscar-washington-four]
TEST
00:33:51 [Q] ERROR Failed [missouri-oscar-washington-four] - division by zero : Traceback (most recent call last):
  File "/Users/galihlprakoso/Projects/garagegameshop/ggs-server/venv/lib/python3.9/site-packages/django_q/cluster.py", line 432, in worker
    res = f(*task["args"], **task["kwargs"])
  File "/Users/galihlprakoso/Projects/garagegameshop/ggs-server/schedulers/test_auto_retry.py", line 6, in schedule
    return 0 / 0
ZeroDivisionError: division by zero

After those two conditions happened, the worker would have been doing nothing. not trying to retry at all. is there anything I missed on the configuration guys?

galihlprakoso avatar Feb 25 '22 17:02 galihlprakoso

  • bump

aliensowo avatar Jun 14 '22 08:06 aliensowo

Got the same error. Which Django version and django_q version do you use?

Grayknife avatar Jul 08 '22 21:07 Grayknife

Same problem here. When my task fails due to deadlocks it just doens't retry.

weberxw avatar Feb 24 '23 14:02 weberxw

This is a known issue, there are many posts about the locking of the queue. The solution is fairly simple, but at lot of work since it needs some solid rewriting. The shared queues need to be replaced by pipes. Pipes are one-to-one and don't lock when the process is killed or stopped. I have started the refactoring here: https://github.com/django-q2/django-q2/pull/78 It's not done yet, but it will resolve this issue and many others.

GDay avatar Feb 24 '23 15:02 GDay