overmind
overmind copied to clipboard
Overmind does not detect crashed process
I'm using overmind to run three processes, one of the processes, "api" a nodejs process ran out of memory and crashed. However overmind still thinks that it's running.
app-user@machine:/app$ overmind ps
PROCESS PID STATUS
nginx 341 running
worker 343 running
api 346 running
app-user@machine:/app$ ps aux|grep 346
app-user 346 0.0 0.0 0 0 ? Zs May09 0:00 [sh] <defunct>
app-user 1092 0.0 0.1 3328 1608 pts/2 S+ 01:26 0:00 grep 346
It looks like the app process id "346" has become a zombie, but overmind has not detected it.
Overmind version: 2.4.0
Operating system: Debian bookworm, based off the docker image node:20.11.1-bookworm-slim, and running on fly.io
This issue happened on two different machines, but I'm really struggling to reproduce it. It might be a tmux issue, sounds similar to this https://github.com/tmux/tmux/issues/311 issue, but I really don't know.
I ran into the same issue from time to time. Happened on earlier version of overmind, upgraded to latest 2.5.1 recently, still happening. I think the zombie process is the shell process, which in turns run the app process.
I spent a day trying to debug this issue without much success, I suspect that it's a actually a tmux bug, but I haven't been able to figure out a reliable way to reproduce it.
Hey there,
This definitely a bug of tmux not handling SIGCHLD properly.
From the Overmind's point of view, the process is still running since Overmind can send signals to it. The only way to check if a process is in the zombie state is to read its state file or to use the ps command. Both ways aren't pretty good to use them with short intervals. And I believe that it's not an imgproxy duty to kill zombies.
The walkaround proposed in https://github.com/tmux/tmux/issues/311 should theoretically work: prepend your commands with trap 'pkill -CHLD tmux' 0; or trap 'pkill -CHLD tmux' EXIT;.
To be honest, Overmind was never meant to run in production, it was developed mostly as a dev tool.