ubuntu:16.04 /bin/kill is broken
Some testcases on "ubuntu:16.04" did show an a different result than "ubuntu:18.04" and all the other distros.
I have tracked it down that python's os.waitpid() returns an exitcode==0 even when the underlying process has actually failed with an exitcode<>0. It is unknown where that bug comes from but it seems a bit serious to trash some very basic unix functionality. Essentially a parent process does not get the correct exitcode from its own children.
P.S. after creating a testprogram it seems that os.waitpid is fine, but /bin/kill is broken
WORKAROUND:
def must_have_failed(waitpid, cmd):
if cmd and cmd[0] == "/bin/kill":
pid = None
for arg in cmd[1:]:
if not arg.startswith("-"):
pid = arg
if pid is None: # unknown $MAINPID
if not waitpid.returncode:
logg.error("waitpid %s did return %s => correcting as 11", cmd, waitpid.returncode)
waitpidNEW = collections.namedtuple("waitpidNEW", ["pid", "returncode", "signal" ])
waitpid = waitpidNEW(waitpid.pid, 11, waitpid.signal)
return waitpid
TESTPROGRAM:
from __future__ import print_function
import os
def check():
pid = os.fork()
if not pid:
cmd = ["/bin/kill","-3"]
os.execv(cmd[0], cmd)
run_pid, run_stat = os.waitpid(pid, 0)
exitcode = os.WEXITSTATUS(run_stat)
sigcode = os.WTERMSIG(run_stat)
print("exitcode", exitcode)
This exitcode must not be 0 but it is in an ubuntu:16.04 container (on opensuse/leap:15.0)
According to https://wiki.ubuntu.com/Releases the Ubuntu LTS 16.04 will be maintained up to April 2021, so this problem could come up for quite a while more.
Here's seperate testprogram
RESULTS:
> python os_waitpid_broken.py -c "/bin/false"
=== RESULTS ['/bin/false']
FROM centos:7.4.1708 => 1
FROM ubuntu:18.04 => 1
FROM ubuntu:16.04 => 1
FROM opensuse/leap:15.0 => 1
> python os_waitpid_broken.py -c "/bin/kill -3"
=== RESULTS ['/bin/kill', '-3']
FROM centos:7.4.1708 => 1
FROM ubuntu:18.04 => 1
FROM ubuntu:16.04 => 0
FROM opensuse/leap:15.0 => 1
> python os_waitpid_broken.py -c "/bin/kill -3" --direct
=== RESULTS ['/bin/kill', '-3']
FROM centos:7.4.1708 => 1
FROM ubuntu:18.04 => 1
FROM ubuntu:16.04 => 0
FROM opensuse/leap:15.0 => 1
The unix standard is a bit ambigous here http://pubs.opengroup.org/onlinepubs/009696899/utilities/kill.html
EXITSTATUS = 0 At least one matching process was found for each pid operand, and the specified signal was successfully processed for at least one matching process.
There no matching process found but there was nothing to be matched against.
Actually, gnu coreutils do have a testcase for it.
http://git.savannah.gnu.org/cgit/coreutils.git/tree/tests/misc/kill.sh
# params required
returns_ 1 env kill || fail=1
returns_ 1 env kill -TERM || fail=1
and there is difference for that test as well
> python os_waitpid_broken.py -c "/bin/kill -TERM" --direct
=== RESULTS ['/bin/kill', '-TERM']
FROM centos:7.4.1708 => 1
FROM ubuntu:18.04 => 1
FROM ubuntu:16.04 => 0
FROM opensuse/leap:15.0 => 1
> python os_waitpid_broken.py -c "/bin/kill" --direct
=== RESULTS ['/bin/kill']
FROM centos:7.4.1708 => 1
FROM ubuntu:18.04 => 1
FROM ubuntu:16.04 => 1
FROM opensuse/leap:15.0 => 1
There is a hint in the ubuntu coreutils bug tracker that /bin/kill does not come from "gnu coreutils" but from the "procps" package.
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/141168
However that does not seem to be true for xenial (16.04) as it is not in the file list of the package
https://packages.ubuntu.com/xenial/procps
checking on all available ubuntu docker images:
python os_waitpid_broken.py -c "/bin/kill -TERM" --direct ubuntu:14.04 ubuntu:16.04 ubuntu:18.04 ubuntu:18.10
=== RESULTS ['/bin/kill', '-TERM']
FROM ubuntu:14.04 => 1
FROM ubuntu:16.04 => 0
FROM ubuntu:18.10 => 1
FROM ubuntu:18.04 => 1
done opening a bug report at ubuntu launchpad for "coreutils".
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/1791983
Attaching a simplified version of the testscript.
As noted in the ubuntu ticket, the tool /bin/kill is really coming from the "procps" package.
== apt-file bin/kill
ubuntu:14.04 APTFILE procps: /bin/kill
ubuntu:14.04 MANPAGE procps-ng October 2011 KILL(1)
ubuntu:16.04 APTFILE procps: /bin/kill
ubuntu:16.04 MANPAGE procps-ng October 2011 KILL(1)
ubuntu:18.04 APTFILE procps: /bin/kill
ubuntu:18.04 MANPAGE
ubuntu:18.10 APTFILE procps: /bin/kill
ubuntu:18.10 MANPAGE
https://bugs.launchpad.net/ubuntu/+source/procps/+bug/1791983
No update on the bug ticket within a month. It may be left broken in Ubuntu 16.04 LTS.
Another month has passed without a hint.
Ubuntu 16.04 LTS will be kept broken?