django-q Cluster unresponsive after worker is killed

I start up a cluster, wait a minute, then kill the worker processes. New workers are incarnated, but the cluster refuses to process new jobs (though it processed them just fine before). When I then try to shut down the cluster it hangs after killing the non-reincarnated members:

$ ./manage.py qcluster
15:19:35 [Q] INFO Q Cluster-7631 starting.
15:19:35 [Q] INFO Process-1:1 ready for work at 7635
15:19:35 [Q] INFO Process-1:2 ready for work at 7636
15:19:35 [Q] INFO Process-1:3 monitoring at 7637
15:19:35 [Q] INFO Process-1 guarding cluster at 7634
15:19:35 [Q] INFO Process-1:4 pushing tasks at 7638
15:19:35 [Q] INFO Q Cluster-7631 running.

(At this point I kill 7635 & 7636 from another window)

15:19:51 [Q] ERROR reincarnated worker Process-1:1 after death
15:19:51 [Q] INFO Process-1:5 ready for work at 7651
15:19:52 [Q] ERROR reincarnated worker Process-1:2 after death
15:19:52 [Q] INFO Process-1:6 ready for work at 7652

(Jobs submitted after this point are ignored by the cluster)

^C16:05:14 [Q] INFO Q Cluster-7631 stopping.
16:05:14 [Q] INFO Process-1 stopping cluster processes
16:05:14 [Q] INFO Process-1:4 stopped pushing tasks

(It's hung here, additional ctrl-C does nothing. Need to ctrl-Z and kill manually.)

^C16:05:17 [Q] INFO Q Cluster-7631 stopping.
^Z
[1]+  Stopped                 ./manage.py qcluster
(dash) $ kill %1
(dash) $ 16:05:25 [Q] INFO Q Cluster-7631 stopping.
16:05:25 [Q] INFO Q Cluster-7631 has stopped.
16:05:25 [Q] INFO Q Cluster-7631 has stopped.
16:05:25 [Q] INFO Q Cluster-7631 has stopped.

Configuration: Running the latest django-q from pip on Ubuntu 14.04, using the config below.

(dash) $ pip freeze -l
arrow==0.8.0
blessed==1.14.1
Django==1.9.1
django-debug-toolbar==1.3.2
django-mysql==1.0.1
django-picklefield==0.3.2
django-q==0.7.18
flufl.lock==2.4.1
future==0.15.2
ipython==3.1.0
mysqlclient==1.3.6
pyaml==15.3.1
python-dateutil==2.5.3
pytz==2015.2
PyYAML==3.11
requests==2.8.1
six==1.10.0
sqlparse==0.1.15
wcwidth==0.1.7
(dash) $ python -V
Python 3.4.3
(dash) $ uname -a
Linux orthrus 3.13.0-98-generic #145-Ubuntu SMP Sat Oct 8 20:13:07 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Q_CLUSTER = {
    'name': 'dash',
    'workers': 2,
    'recycle': 1,
    'timeout': 6000,
    'retry': 6060,
    'compress': False,
    'orm': 'default',
}
CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.filebased.FileBasedCache',
        'LOCATION': '/tmp/django_cache',
    }
}

Oct 20 '16 23:10 krid

+1 Got the same issue

Oct 31 '16 16:10 frank-u

+1 Got the same issue. My findings, after a reincarnate:

"Pusher" is still connected to the broker and thus receive messages from redis (or other)
"Pusher" pushes messages into the multi processing queue without error
Newly reincarnate Worker is waiting on the task_queue (multiprocessing.Queue) but never returns from task_queue.get

My workaround: I've added a scheduled task to push metric (some kind of ping) to Cloudwatch (I'm on AWS) and I trigger a restart of the whole cluster if I have no ping for more than X minutes ...