dramatiq
dramatiq copied to clipboard
Worker restarts automatically
Issues
I have created workers on two servers and according to the logs, the workers on both servers are restarting automatically at the same time, I don't understand why this is.
logs:
21:48:01,102 [ALERT] export:64 - The subdomain result for jd.co.th: /root/Gleam/plugins/domain/oneforall/results/jd.co.th.csv
21:48:01,102 [INFOR] oneforall:258 - Finished OneForAll
[2022-01-22 21:48:01,149] [PID 16211] [Thread-8] [tasks.normal.domain.oneforall_domain_task.oneforall_task] [DEBUG] Completed after 38457.04ms.
[2022-01-22 21:48:01,149] [PID 16211] [Thread-8] [dramatiq.worker.ConsumerThread(domain)] [DEBUG] Acknowledging message '1ae067af-1bf7-4c30-99fb-4499bf8720b0'.
[2022-01-22 22:10:50,444] [PID 16210] [MainThread] [dramatiq.WorkerProcess(0)] [INFO] Stopping worker process...
[2022-01-22 22:10:50,443] [PID 16211] [MainThread] [dramatiq.WorkerProcess(1)] [INFO] Stopping worker process...
[2022-01-22 22:10:50,444] [PID 16244] [MainThread] [dramatiq.ForkProcess(0)] [INFO] Stopping fork process...
[2022-01-22 22:10:50,446] [PID 16209] [MainThread] [dramatiq.MainProcess] [INFO] Sending signal 1 to subprocesses...
[2022-01-22 22:10:50,446] [PID 16210] [MainThread] [dramatiq.WorkerProcess(0)] [WARNING] Killing worker process...
[2022-01-22 22:10:50,446] [PID 16211] [MainThread] [dramatiq.WorkerProcess(1)] [WARNING] Killing worker process...
[2022-01-22 22:10:52,059] [PID 16209] [MainThread] [dramatiq.MainProcess] [INFO] Dramatiq '1.12.2' is booting up.
[2022-01-22 22:10:51,524] [PID 22623] [MainThread] [dramatiq.WorkerProcess(0)] [DEBUG] Loading broker...
[2022-01-22 22:10:51,983] [PID 22623] [MainThread] [dramatiq.middleware.prometheus.Prometheus] [DEBUG] Setting up metrics...
[2022-01-22 22:10:51,983] [PID 22623] [MainThread] [dramatiq.WorkerProcess(0)] [DEBUG] Loading modules...
[2022-01-22 22:10:52,054] [PID 22623] [MainThread] [dramatiq.WorkerProcess(0)] [DEBUG] Sending forks to main process...
[2022-01-22 22:10:52,055] [PID 22623] [MainThread] [dramatiq.WorkerProcess(0)] [DEBUG] Starting worker threads...
[2022-01-22 22:10:52,055] [PID 22623] [MainThread] [dramatiq.worker._WorkerMiddleware] [DEBUG] Adding consumer for queue 'domain'.
[2022-01-22 22:10:52,055] [PID 22623] [Thread-3] [dramatiq.worker.ConsumerThread(domain)] [DEBUG] Running consumer thread...
[2022-01-22 22:10:52,055] [PID 22623] [MainThread] [dramatiq.worker._WorkerMiddleware] [DEBUG] Adding consumer for delay queue 'domain.DQ'.
[2022-01-22 22:10:52,055] [PID 22623] [Thread-4] [dramatiq.worker.ConsumerThread(domain.DQ)] [DEBUG] Running consumer thread...
[2022-01-22 22:10:52,056] [PID 22623] [Thread-5] [dramatiq.worker.WorkerThread] [DEBUG] Running worker thread...
[2022-01-22 22:10:52,056] [PID 22623] [Thread-6] [dramatiq.worker.WorkerThread] [DEBUG] Running worker thread...
[2022-01-22 22:10:52,056] [PID 22623] [Thread-7] [dramatiq.worker.WorkerThread] [DEBUG] Running worker thread...
[2022-01-22 22:10:52,056] [PID 22623] [Thread-8] [dramatiq.worker.WorkerThread] [DEBUG] Running worker thread...
[2022-01-22 22:10:52,056] [PID 22623] [Thread-9] [dramatiq.worker.WorkerThread] [DEBUG] Running worker thread...
[2022-01-22 22:10:52,057] [PID 22623] [Thread-10] [dramatiq.worker.WorkerThread] [DEBUG] Running worker thread...
[2022-01-22 22:10:52,057] [PID 22623] [Thread-11] [dramatiq.worker.WorkerThread] [DEBUG] Running worker thread...
[2022-01-22 22:10:52,057] [PID 22623] [Thread-12] [dramatiq.worker.WorkerThread] [DEBUG] Running worker thread...
[2022-01-22 22:10:52,057] [PID 22623] [MainThread] [dramatiq.WorkerProcess(0)] [INFO] Worker process is ready for action.
[2022-01-22 22:10:51,526] [PID 22624] [MainThread] [dramatiq.WorkerProcess(1)] [DEBUG] Loading broker...
[2022-01-22 22:10:51,983] [PID 22624] [MainThread] [dramatiq.middleware.prometheus.Prometheus] [DEBUG] Setting up metrics...
[2022-01-22 22:10:51,983] [PID 22624] [MainThread] [dramatiq.WorkerProcess(1)] [DEBUG] Loading modules...
[2022-01-22 22:10:52,055] [PID 22624] [MainThread] [dramatiq.WorkerProcess(1)] [DEBUG] Starting worker threads...
[2022-01-22 22:10:52,055] [PID 22624] [MainThread] [dramatiq.worker._WorkerMiddleware] [DEBUG] Adding consumer for queue 'domain'.
[2022-01-22 22:10:52,055] [PID 22624] [Thread-3] [dramatiq.worker.ConsumerThread(domain)] [DEBUG] Running consumer thread...
[2022-01-22 22:10:52,055] [PID 22624] [MainThread] [dramatiq.worker._WorkerMiddleware] [DEBUG] Adding consumer for delay queue 'domain.DQ'.
[2022-01-22 22:10:52,055] [PID 22624] [Thread-4] [dramatiq.worker.ConsumerThread(domain.DQ)] [DEBUG] Running consumer thread...
[2022-01-22 22:10:52,056] [PID 22624] [Thread-5] [dramatiq.worker.WorkerThread] [DEBUG] Running worker thread...
[2022-01-22 22:10:52,056] [PID 22624] [Thread-6] [dramatiq.worker.WorkerThread] [DEBUG] Running worker thread...
[2022-01-22 22:10:52,056] [PID 22624] [Thread-7] [dramatiq.worker.WorkerThread] [DEBUG] Running worker thread...
[2022-01-22 22:10:52,056] [PID 22624] [Thread-8] [dramatiq.worker.WorkerThread] [DEBUG] Running worker thread...
[2022-01-22 22:10:52,057] [PID 22624] [Thread-9] [dramatiq.worker.WorkerThread] [DEBUG] Running worker thread...
[2022-01-22 22:10:52,057] [PID 22624] [Thread-10] [dramatiq.worker.WorkerThread] [DEBUG] Running worker thread...
[2022-01-22 22:10:52,057] [PID 22624] [Thread-11] [dramatiq.worker.WorkerThread] [DEBUG] Running worker thread...
[2022-01-22 22:10:52,057] [PID 22624] [Thread-12] [dramatiq.worker.WorkerThread] [DEBUG] Running worker thread...
[2022-01-22 22:10:52,057] [PID 22624] [MainThread] [dramatiq.WorkerProcess(1)] [INFO] Worker process is ready for action.
[2022-01-22 22:10:52,059] [PID 22657] [MainThread] [dramatiq.ForkProcess(0)] [DEBUG] Loading fork function...
[2022-01-22 22:10:52,059] [PID 22657] [MainThread] [dramatiq.ForkProcess(0)] [INFO] Fork process 'dramatiq.middleware.prometheus:_run_exposition_server' is ready for action.
[2022-01-22 22:10:52,060] [PID 22657] [MainThread] [dramatiq.middleware.prometheus._run_exposition_server] [DEBUG] Starting exposition server...
[2022-01-22 22:52:05,127] [PID 22623] [Thread-4] [dramatiq.worker.ConsumerThread(domain.DQ)] [DEBUG] Pushing message 'c7b6dcac-a064-4532-8777-f61532e32a49' onto delay queue.
[2022-01-22 22:52:05,127] [PID 22623] [Thread-4] [dramatiq.worker.ConsumerThread(domain.DQ)] [DEBUG] Pushing message 'fb4aad49-ae9f-41ce-a330-2fb028852226' onto delay queue.
[2022-01-22 22:52:05,127] [PID 22623] [Thread-4] [dramatiq.worker.ConsumerThread(domain.DQ)] [DEBUG] Pushing message 'dbfb8b8e-2939-4eaf-9c74-e96fa8204e1e' onto delay queue.
[2022-01-22 22:52:05,128] [PID 22623] [Thread-4] [dramatiq.worker.ConsumerThread(domain.DQ)] [DEBUG] Pushing message 'c20d2b37-3955-4ab0-b6e6-5da741779b3f' onto delay queue.
[2022-01-22 22:52:05,128] [PID 22623] [Thread-4] [dramatiq.worker.ConsumerThread(domain.DQ)] [DEBUG] Pushing message '855b2581-e623-40b2-827e-0eefdfb23ce0' onto delay queue.
As you can see, the worker that was still executing the task at 21 was automatically restarted at 22.
After debugging, I found that the following method is automatically called:
def termhandler(signum, frame):
nonlocal running
if running:
logger.info("Stopping worker process...")
running = False
else:
logger.warning("Killing worker process...")
return sys.exit(RET_KILLED)
logger.info("Worker process is ready for action.")
signal.signal(signal.SIGINT, signal.SIG_IGN)
signal.signal(signal.SIGTERM, termhandler)
if hasattr(signal, "SIGHUP"):
signal.signal(signal.SIGHUP, termhandler)
if hasattr(signal, "SIGBREAK"):
signal.signal(signal.SIGBREAK, termhandler)
Is there a possibility that these signals are generated while the program is running? non-artificial behavior.
I'm experiencing same problem. I'm running dramatiq-gevent (1.10.0) inside docker container on DIgitalOcean droplet.
CMD ["dramatiq-gevent", "tasks", "-p", "8", "-t", "60", "-v"]
After I start container, dramatiq starts processing messages. After 1-2 minutes, this appears in logs:
Stopping worker process...
Which means termhandler
was called. I have 8 processes and term handler seems to be called only 7 times. And after ~12 minutes workers shut down
Channels and connections closed
And restarted
Running worker thread
This happened multiple times today. So, example timeline:
23:49:25 Started container
23:51:45 Got signal (Stopping worker process...
)
00:04:31 Workers stopped (Channels and connections closed
)
00:05:10 Workers restarted (Running worker thread
)
I have no idea why this is happening :( I use dramatiq to perform really long tasks and they aren't idempotent, so this breaks all my code :(
UPD: continued my research, it seems one of workers exited unexpectedly
Worker with PID 13 exited unexpectedly (code -9). Shutting down...
@tlmn-local did you have any luck with debugging/fixing this?
@tlmn-local did you have any luck with debugging/fixing this?
I think it's a redis problem, when I try to change the broker to rabbitmq the problem disappears.
Meh, I already use rabbitmq
So, I figured this out. I'm feeling kinda dumb, but well, maybe this will be useful to someone who stumbles on this issue in future.
I checked kernel logs (/var/log/kern.log and /var/log/kern.log.1) and found out that worker was shut by OOMKiller. This did not make much sense for me, since workers didn't used even half of available 16GB. But then I remembered that in past these workers were deployed on smaller Droplet with 4GB of ram and I set 3GB memory limit in docker-compose file.
mem_limit: 3g
And that was the issue. Lifting up memory limit solved it
This behavior is expected so things are working as intended. If workers die for any reason, the main process restarts everything.
Worker with PID 13 exited unexpectedly (code -9). Shutting down...
That error code indicates that the process was SIGKILL
ed (as you later found out, by the OOM killer).