django-q
django-q copied to clipboard
Cluster unresponsive after worker is killed
I start up a cluster, wait a minute, then kill the worker processes. New workers are incarnated, but the cluster refuses to process new jobs (though it processed them just fine before). When I then try to shut down the cluster it hangs after killing the non-reincarnated members:
$ ./manage.py qcluster
15:19:35 [Q] INFO Q Cluster-7631 starting.
15:19:35 [Q] INFO Process-1:1 ready for work at 7635
15:19:35 [Q] INFO Process-1:2 ready for work at 7636
15:19:35 [Q] INFO Process-1:3 monitoring at 7637
15:19:35 [Q] INFO Process-1 guarding cluster at 7634
15:19:35 [Q] INFO Process-1:4 pushing tasks at 7638
15:19:35 [Q] INFO Q Cluster-7631 running.
(At this point I kill 7635 & 7636 from another window)
15:19:51 [Q] ERROR reincarnated worker Process-1:1 after death
15:19:51 [Q] INFO Process-1:5 ready for work at 7651
15:19:52 [Q] ERROR reincarnated worker Process-1:2 after death
15:19:52 [Q] INFO Process-1:6 ready for work at 7652
(Jobs submitted after this point are ignored by the cluster)
^C16:05:14 [Q] INFO Q Cluster-7631 stopping.
16:05:14 [Q] INFO Process-1 stopping cluster processes
16:05:14 [Q] INFO Process-1:4 stopped pushing tasks
(It's hung here, additional ctrl-C does nothing. Need to ctrl-Z and kill manually.)
^C16:05:17 [Q] INFO Q Cluster-7631 stopping.
^Z
[1]+ Stopped ./manage.py qcluster
(dash) $ kill %1
(dash) $ 16:05:25 [Q] INFO Q Cluster-7631 stopping.
16:05:25 [Q] INFO Q Cluster-7631 has stopped.
16:05:25 [Q] INFO Q Cluster-7631 has stopped.
16:05:25 [Q] INFO Q Cluster-7631 has stopped.
Configuration: Running the latest django-q from pip on Ubuntu 14.04, using the config below.
(dash) $ pip freeze -l
arrow==0.8.0
blessed==1.14.1
Django==1.9.1
django-debug-toolbar==1.3.2
django-mysql==1.0.1
django-picklefield==0.3.2
django-q==0.7.18
flufl.lock==2.4.1
future==0.15.2
ipython==3.1.0
mysqlclient==1.3.6
pyaml==15.3.1
python-dateutil==2.5.3
pytz==2015.2
PyYAML==3.11
requests==2.8.1
six==1.10.0
sqlparse==0.1.15
wcwidth==0.1.7
(dash) $ python -V
Python 3.4.3
(dash) $ uname -a
Linux orthrus 3.13.0-98-generic #145-Ubuntu SMP Sat Oct 8 20:13:07 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Q_CLUSTER = {
'name': 'dash',
'workers': 2,
'recycle': 1,
'timeout': 6000,
'retry': 6060,
'compress': False,
'orm': 'default',
}
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.filebased.FileBasedCache',
'LOCATION': '/tmp/django_cache',
}
}
+1 Got the same issue
+1 Got the same issue. My findings, after a reincarnate:
- "Pusher" is still connected to the broker and thus receive messages from redis (or other)
- "Pusher" pushes messages into the multi processing queue without error
- Newly reincarnate Worker is waiting on the task_queue (multiprocessing.Queue) but never returns from task_queue.get
My workaround: I've added a scheduled task to push metric (some kind of ping) to Cloudwatch (I'm on AWS) and I trigger a restart of the whole cluster if I have no ping for more than X minutes ...
Hi I'm having the same problem.
I want to be able to terminate the cluster but it hangs after I press Control+C:
^C13:56:07 [Q] INFO Q Cluster-18641 stopping.
I am also seeing this same issue
+1 Got the same issue
I think this Error is showing a mistake in our code , may be some of the variable or argument is incorrect..
Same issue for us also. Any news guys on this ?