coreutils icon indicating copy to clipboard operation
coreutils copied to clipboard

timeout: subprocess is never terminated

Open miles170 opened this issue 3 years ago • 7 comments

When I was working on #4073, I discovered that timeout never terminates the subprocess under the sh shell. Is this expected behavior?

$ timeout .5 sh -c "yes | split --filter='head -c1 >/dev/null' -b 1000"
$ ps aux | grep split
coreuti+ 55384  0.0  0.0   3244   712 pts/0    S+   05:05   0:00 grep split
$ /coreutils/target/release/timeout .5 sh -c "yes | split --filter='head -c1 >/dev/null' -b 1000"
$ ps aux | grep split
coreuti+ 55388  6.0  0.0   2680  1672 pts/0    S    05:05   0:00 split --filter=head -c1 >/dev/null -b 1000
coreuti+ 61201  0.0  0.0   3244   644 pts/0    S+   05:05   0:00 grep split

miles170 avatar Nov 22 '22 05:11 miles170

Nope, looks like a bug. Does it work with other times? Like, 0.5, 1 and 1.5?

tertsdiepraam avatar Nov 22 '22 07:11 tertsdiepraam

Nope, looks like a bug. Does it work with other times? Like, 0.5, 1 and 1.5?

$ timeout 1.5 sh -c "yes | split --filter='head -c1 >/dev/null' -b 1000"
$ ps aux | grep split
coreuti+  1916  0.0  0.0   3244   648 pts/0    S+   07:38   0:00 grep split
$ /coreutils/target/release/timeout 1.5 sh -c "yes | split --filter='head -c1 >/dev/null' -b 1000"
$ ps aux | grep split
coreuti+  1920  8.0  0.0   2944  1880 pts/0    S    07:38   0:00 split --filter=head -c1 >/dev/null -b 1000
coreuti+ 14847  0.0  0.0   3244   648 pts/0    S+   07:38   0:00 grep split

I don't think so. GNU timeout sends a signal to the process group if it is in the background. See https://github.com/coreutils/coreutils/blob/master/src/timeout.c#L236-L246

If I use libc::kill(0, signal) to send a signal to the process group, the timeout will receive it. Cannot find a way to send a signal to all children without breaking the current code.

miles170 avatar Nov 22 '22 07:11 miles170

Oh I see, I misinterpreted the explanation. Thanks!

tertsdiepraam avatar Nov 22 '22 08:11 tertsdiepraam

Would you mind giving me some hints so I can solve this problem?

miles170 avatar Nov 22 '22 08:11 miles170

I was just looking through the code, but I'm not sure what's going wrong, so I don't have any hints, sorry.

tertsdiepraam avatar Nov 22 '22 08:11 tertsdiepraam

@miles170: Before GNU timeout kills the process group it blocks the signal for itself first here.

It should be enough to just copy that behaviour, i.e. block the signal, before sending it to the process group, and also send SIGCONT to the child and the whole process group after that.


Note, that GNU timeout doesn't check for errors of signal. We shouldn't, either. For example: The subprocess/shell might have exited just after the timeout. Sending a signal now would return ESRCH, but we should still try to kill the children.

On linux, man 2 kill documents only 3 possible errors of kill(). IMHO all these errors can be ignored, and timeout() should continue trying to kill the process group.

       EINVAL An invalid signal was specified.

       EPERM  The  calling process does not have permission to send the signal
              to any of the target processes.

       ESRCH  The target process or process group does not exist.   Note  that
              an existing process might be a zombie, a process that has termi‐
              nated execution, but has not yet been wait(2)ed for.

castilma avatar Jan 31 '23 20:01 castilma

If the process exits by itself right after the timeout, GNU timeout shows a different behaviour: After sending the signals, it calls waitpid() again, which tells, why the process exited.

Our timeout just assumes the process exited because of the timeout. We should call wait() after sending the signals.

castilma avatar Jan 31 '23 21:01 castilma