Process::isRunning returns true for defunct tagged process
After starting a process with:
Pipe out, err;
auto foo = Process::launch("foo", {}, nullptr, &out, &err);
The following snippet never exits the loop:
while (Process::isRunning(foo)) {
cout << foo.id() << " still running" << endl;
sleep_for(1s);
}
I've checked with ps -A | grep foo that the given foo process has been tagged as defunct by the system just after Process::launch, meaning that it has been started and exited successfully, but it still stays around as defunct until the parent application terminates:
.
.
.
32564 pts/15 00:00:00 zsh
32565 pts/5 00:00:00 foo <defunct>
32680 ? 00:10:33 geary
.
.
.
I'm running on x64 Linux with Poco 1.6.1.
I've realized that if foo.wait() gets called before Process::isRunning(foo) then isRunning does return false, meaning that the process stops hanging around as zombie.
I'm not sure what should be the expected behavior. I'm running through all this with Poco because I'm attempting to construct a timed wait(), since Poco doesn't provide one and I can't risk blocking indefinitely.
So my first attempt was to periodically check isRunning and exit on timing out or the child process ending its execution.
That's the reality of this world, you need to call waitpid() for child process to terminate, and Process::isRunning() doesn't do that. I've cooked up a small patch to make Process::isRunning(const ProcessHandle&) call waidpid() but I cannot say I like it much (although it solves the issue).
Ideally we should probably setup a dedicated thread which sits on self-pipe read, calls waitpid() and notifies interested parties (ProcessHandle objects or user-defined listeners) of process status changes. This will be a breaking change though, in that we'll need to register global SIGCHLD handler and thus potentially interfere with other code (external to Poco) which uses either SIGCHLD or waitpid() directly in the same executable.
Note that dedicated thread will also allow for nice Process::tryWait() / ProcessHandle::tryWait() implementation, which is possible on Windows right now but not on *NIX. Another good thing about it would be that users will no longer need to explicitly wait for a process, and this opens the door for detached launch (if you're only interested in launching the process, but not in waiting for it and getting the exit code).
I just came here for this very same question and don't share @mikedld's view on the reality of this world. A zombie process already has terminated, it consumes almost no memory and will never be on the run queue again. In my understanding such a process is not running anymore. The kernel merely holds a small data structure so somebody can get the exit status and other information.
If the behavior of Process::isRunning() can't be changed a note regarding zombie processes should at least be in the documentation.
Regarding Process::tryWait() on *NIX: There is the option WNOHANG for waitpid which will not block the call if the child has not exited yet.
I'd like to notice that, for example, subprocess.Popen on python provides wait with timeout option.
Committed that small patch I was talking about before. Let's see where it leads us to.
@mikedld Coudn't check it yet but thanks.
It appears that this fix was merged to pocoproject::develop, but has not been merged into any poco release (as of 1.9.1): https://github.com/pocoproject/poco/pull/1115
Is that intentional?
not intentional, please send pull if you want it in the next release
Shouldn't the issue be reopened until there's a plan of merging to releases?
@aleks-f , PR as requested: https://github.com/pocoproject/poco/pull/2535
Never fixed. This is truly the bad ending. RIP Josh's PR.
To others looking for a workable solution - you could wait for a Poco::Pipe attached to the process' stdout to return 0, or check for Process::try_wait(hndl) != -1.
@Icedude907, would you prepare a PR that would solve the problem and help others that are affected?
Hey @matejk, thanks for reaching out. I have found a solution I am happy with so I don't feel a pressing need to try fixing the issue upstream. Thank you for all the work you do for this project. Have a good day!
@Icedude907 at the time, Josh was asked for some further changes, but he had no time. Now, 6 years later, you complain about it but also can't be bothered to contribute. FWIW, here is the change adapted to the most recent release; maybe someone will find time to put it in order. I only checked on mac and tests do not pass.