docker-systemctl-replacement icon indicating copy to clipboard operation
docker-systemctl-replacement copied to clipboard

ubuntu:16.04 /bin/kill is broken

Open gdraheim opened this issue 7 years ago • 13 comments

Some testcases on "ubuntu:16.04" did show an a different result than "ubuntu:18.04" and all the other distros.

I have tracked it down that python's os.waitpid() returns an exitcode==0 even when the underlying process has actually failed with an exitcode<>0. It is unknown where that bug comes from but it seems a bit serious to trash some very basic unix functionality. Essentially a parent process does not get the correct exitcode from its own children.

P.S. after creating a testprogram it seems that os.waitpid is fine, but /bin/kill is broken

gdraheim avatar Sep 10 '18 20:09 gdraheim

WORKAROUND:

def must_have_failed(waitpid, cmd):
    if cmd and cmd[0] == "/bin/kill":
        pid = None
        for arg in cmd[1:]:
            if not arg.startswith("-"):
                pid = arg
        if pid is None: # unknown $MAINPID
            if not waitpid.returncode:
                logg.error("waitpid %s did return %s => correcting as 11", cmd, waitpid.returncode)
            waitpidNEW = collections.namedtuple("waitpidNEW", ["pid", "returncode", "signal" ])
            waitpid = waitpidNEW(waitpid.pid, 11, waitpid.signal)
    return waitpid

gdraheim avatar Sep 10 '18 20:09 gdraheim

TESTPROGRAM:

from __future__ import print_function
import os
def check():
   pid = os.fork()
   if not pid:
      cmd = ["/bin/kill","-3"]
      os.execv(cmd[0], cmd)
   run_pid, run_stat = os.waitpid(pid, 0)
   exitcode = os.WEXITSTATUS(run_stat)
   sigcode = os.WTERMSIG(run_stat)
   print("exitcode", exitcode)

This exitcode must not be 0 but it is in an ubuntu:16.04 container (on opensuse/leap:15.0)

gdraheim avatar Sep 10 '18 20:09 gdraheim

According to https://wiki.ubuntu.com/Releases the Ubuntu LTS 16.04 will be maintained up to April 2021, so this problem could come up for quite a while more.

gdraheim avatar Sep 10 '18 21:09 gdraheim

Here's seperate testprogram

os_waitpid_broken.py

RESULTS:

> python os_waitpid_broken.py -c "/bin/false"
=== RESULTS ['/bin/false']
FROM centos:7.4.1708 => 1
FROM ubuntu:18.04 => 1
FROM ubuntu:16.04 => 1
FROM opensuse/leap:15.0 => 1

> python os_waitpid_broken.py -c "/bin/kill -3"
=== RESULTS ['/bin/kill', '-3']
FROM centos:7.4.1708 => 1
FROM ubuntu:18.04 => 1
FROM ubuntu:16.04 => 0
FROM opensuse/leap:15.0 => 1

> python os_waitpid_broken.py -c "/bin/kill -3" --direct
=== RESULTS ['/bin/kill', '-3']
FROM centos:7.4.1708 => 1
FROM ubuntu:18.04 => 1
FROM ubuntu:16.04 => 0
FROM opensuse/leap:15.0 => 1

gdraheim avatar Sep 11 '18 11:09 gdraheim

The unix standard is a bit ambigous here http://pubs.opengroup.org/onlinepubs/009696899/utilities/kill.html

EXITSTATUS = 0 At least one matching process was found for each pid operand, and the specified signal was successfully processed for at least one matching process.

There no matching process found but there was nothing to be matched against.

gdraheim avatar Sep 11 '18 11:09 gdraheim

Actually, gnu coreutils do have a testcase for it.

http://git.savannah.gnu.org/cgit/coreutils.git/tree/tests/misc/kill.sh

# params required
returns_ 1 env kill || fail=1
returns_ 1 env kill -TERM || fail=1

gdraheim avatar Sep 11 '18 15:09 gdraheim

and there is difference for that test as well

> python os_waitpid_broken.py -c "/bin/kill -TERM" --direct
=== RESULTS ['/bin/kill', '-TERM']
FROM centos:7.4.1708 => 1
FROM ubuntu:18.04 => 1
FROM ubuntu:16.04 => 0
FROM opensuse/leap:15.0 => 1

> python os_waitpid_broken.py -c "/bin/kill" --direct
=== RESULTS ['/bin/kill']
FROM centos:7.4.1708 => 1
FROM ubuntu:18.04 => 1
FROM ubuntu:16.04 => 1
FROM opensuse/leap:15.0 => 1

gdraheim avatar Sep 11 '18 15:09 gdraheim

There is a hint in the ubuntu coreutils bug tracker that /bin/kill does not come from "gnu coreutils" but from the "procps" package.

https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/141168

However that does not seem to be true for xenial (16.04) as it is not in the file list of the package

https://packages.ubuntu.com/xenial/procps

gdraheim avatar Sep 11 '18 15:09 gdraheim

checking on all available ubuntu docker images:

python os_waitpid_broken.py -c "/bin/kill -TERM" --direct ubuntu:14.04 ubuntu:16.04 ubuntu:18.04 ubuntu:18.10

 === RESULTS ['/bin/kill', '-TERM']
FROM ubuntu:14.04 => 1
FROM ubuntu:16.04 => 0
FROM ubuntu:18.10 => 1
FROM ubuntu:18.04 => 1

gdraheim avatar Sep 11 '18 15:09 gdraheim

done opening a bug report at ubuntu launchpad for "coreutils".

https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/1791983

Attaching a simplified version of the testscript.

bin_kill_broken.sh

gdraheim avatar Sep 11 '18 17:09 gdraheim

As noted in the ubuntu ticket, the tool /bin/kill is really coming from the "procps" package.

 == apt-file bin/kill
 ubuntu:14.04 APTFILE procps: /bin/kill
 ubuntu:14.04 MANPAGE procps-ng                        October 2011                          KILL(1)
 ubuntu:16.04 APTFILE procps: /bin/kill
 ubuntu:16.04 MANPAGE procps-ng                        October 2011                          KILL(1)
 ubuntu:18.04 APTFILE procps: /bin/kill
 ubuntu:18.04 MANPAGE 
 ubuntu:18.10 APTFILE procps: /bin/kill
 ubuntu:18.10 MANPAGE 

bin_kill_broken.sh

gdraheim avatar Sep 11 '18 20:09 gdraheim

https://bugs.launchpad.net/ubuntu/+source/procps/+bug/1791983

No update on the bug ticket within a month. It may be left broken in Ubuntu 16.04 LTS.

gdraheim avatar Oct 13 '18 12:10 gdraheim

Another month has passed without a hint.

Ubuntu 16.04 LTS will be kept broken?

gdraheim avatar Nov 15 '18 07:11 gdraheim