kubernetes-mesos icon indicating copy to clipboard operation
kubernetes-mesos copied to clipboard

in cluster/mesos/docker, k8s cgroup-based pid lister is broken

Open jdef opened this issue 10 years ago • 2 comments

Kubernetes (manager.go in kubelet/dockertools) uses k8s.io/pkg/util/procfs to list procs in a cgroup in order to set OOM adjustments. the default implementation looks at the devices cgroup and attempts to scrape a reasonable /sys/fs/cgroup/... path from it. only in our case /sys/fs/cgroup/docker/... doesn't exist but /sys/fs/cgroup/mesos/... does. Probably related to the fact that our mesos-slave is running as a docker container.

my host system, with some systemd components running (but not nearly as many as on fedora):

$ uname -a
Linux node-1 3.19.0-30-generic #34~14.04.1-Ubuntu SMP Fri Oct 2 22:09:39 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
$ ps auxwww|grep -e systemd
root       416  0.0  0.0  51396  2152 ?        Ss   Oct18   0:00 /lib/systemd/systemd-udevd --daemon
root      1081  0.0  0.0  43456  2104 ?        Ss   Oct18   0:00 /lib/systemd/systemd-logind
$ docker version
Client:
 Version:      1.8.3
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   f4bf5c7
 Built:        Mon Oct 12 05:37:18 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.8.3
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   f4bf5c7
 Built:        Mon Oct 12 05:37:18 UTC 2015
 OS/Arch:      linux/amd64

from executor.log:

E1020 17:54:16.954531     200 oom_linux.go:89] Error getting process list for cgroup /docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273/59dd9e4f1b7ae3865b74edacd04b4c23484b1e5d178d06b27fc10d7d04c94600: open /sys/fs/cgroup/devices/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273/59dd9e4f1b7ae3865b74edacd04b4c23484b1e5d178d06b27fc10d7d04c94600/cgroup.procs: no such file or directory

docker inspect executed in mesos-slave CT, querying a pod CT:

...
        "CgroupParent": "/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273",
...

minion process cgroups (again, within the slave container):

slave:/# cat /proc/186/cgroup
12:name=systemd:/
11:hugetlb:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553
10:net_prio:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553
9:perf_event:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553
8:blkio:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553
7:net_cls:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553
6:freezer:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273
5:devices:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553
4:memory:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273
3:cpuacct:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273
2:cpu:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273
1:cpuset:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553

pod CT process cgroups (within the slave container):

root@mesosslave:/# cat /proc/353/cgroup
12:name=systemd:/
11:hugetlb:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273/59dd9e4f1b7ae3865b74edacd04b4c23484b1e5d178d06b27fc10d7d04c94600
10:net_prio:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273/59dd9e4f1b7ae3865b74edacd04b4c23484b1e5d178d06b27fc10d7d04c94600
9:perf_event:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273/59dd9e4f1b7ae3865b74edacd04b4c23484b1e5d178d06b27fc10d7d04c94600
8:blkio:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273/59dd9e4f1b7ae3865b74edacd04b4c23484b1e5d178d06b27fc10d7d04c94600
7:net_cls:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273/59dd9e4f1b7ae3865b74edacd04b4c23484b1e5d178d06b27fc10d7d04c94600
6:freezer:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273/59dd9e4f1b7ae3865b74edacd04b4c23484b1e5d178d06b27fc10d7d04c94600
5:devices:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273/59dd9e4f1b7ae3865b74edacd04b4c23484b1e5d178d06b27fc10d7d04c94600
4:memory:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273/59dd9e4f1b7ae3865b74edacd04b4c23484b1e5d178d06b27fc10d7d04c94600
3:cpuacct:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273/59dd9e4f1b7ae3865b74edacd04b4c23484b1e5d178d06b27fc10d7d04c94600
2:cpu:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273/59dd9e4f1b7ae3865b74edacd04b4c23484b1e5d178d06b27fc10d7d04c94600
1:cpuset:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553/mesos/a65c6fd1-793b-490c-a3cb-0e3abcdd6273/59dd9e4f1b7ae3865b74edacd04b4c23484b1e5d178d06b27fc10d7d04c94600

PID 1 cgroups (mesos-slave process, running inside the mesosslave container):

root@mesosslave:/# cat /proc/1/cgroup           
12:name=systemd:/
11:hugetlb:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553
10:net_prio:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553
9:perf_event:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553
8:blkio:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553
7:net_cls:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553
6:freezer:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553
5:devices:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553
4:memory:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553
3:cpuacct:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553
2:cpu:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553
1:cpuset:/docker/4742614cfab2d5c32f664fc9c0a79741fbd2971784be3dee63c6f2c6b7362553

devices cgroups visible in slave CT (this is how k8s/util/procfs makes decisions):

root@mesosslave:/# ls -F /sys/fs/cgroup/devices/
cgroup.clone_children  cgroup.procs  devices.allow  devices.deny  devices.list  mesos/  notify_on_release  tasks

jdef avatar Oct 20 '15 18:10 jdef

I've tried running both my hosts's docker daemon, as well as the slave's docker daemon, in forced cgroupfs mode. I experience the same symptoms. /proc/<pid>/cgroup includes cgroup paths that make sense from the perspective of the host, but not from the perspective from the slave CT.

To resolve this we'd need to refactor the kubelet a bit: NewKubelet() should accept a procfs parameter (instead of creating the default on its own) and then we'd be able to plug in a custom procfs implementation that knows how to handle this situation. It's easily detectable by examining the contents of /proc/1/cgroup:

from the host

vagrant@node-1:~$ cat /proc/1/cgroup
12:name=systemd:/
11:hugetlb:/
10:net_prio:/
9:perf_event:/
8:blkio:/
7:net_cls:/
6:freezer:/
5:devices:/
4:memory:/
3:cpuacct:/
2:cpu:/
1:cpuset:/

from inside the slave CT:

root@mesosslave:/# cat /proc/1/cgroup
12:name=systemd:/
11:hugetlb:/docker/611a8d133d2c636e777745e4dbb1d5866a068fe0f6cc87033e52fe139fca5ba1
10:net_prio:/docker/611a8d133d2c636e777745e4dbb1d5866a068fe0f6cc87033e52fe139fca5ba1
9:perf_event:/docker/611a8d133d2c636e777745e4dbb1d5866a068fe0f6cc87033e52fe139fca5ba1
8:blkio:/docker/611a8d133d2c636e777745e4dbb1d5866a068fe0f6cc87033e52fe139fca5ba1
7:net_cls:/docker/611a8d133d2c636e777745e4dbb1d5866a068fe0f6cc87033e52fe139fca5ba1
6:freezer:/docker/611a8d133d2c636e777745e4dbb1d5866a068fe0f6cc87033e52fe139fca5ba1
5:devices:/docker/611a8d133d2c636e777745e4dbb1d5866a068fe0f6cc87033e52fe139fca5ba1
4:memory:/docker/611a8d133d2c636e777745e4dbb1d5866a068fe0f6cc87033e52fe139fca5ba1
3:cpuacct:/docker/611a8d133d2c636e777745e4dbb1d5866a068fe0f6cc87033e52fe139fca5ba1
2:cpu:/docker/611a8d133d2c636e777745e4dbb1d5866a068fe0f6cc87033e52fe139fca5ba1
1:cpuset:/docker/611a8d133d2c636e777745e4dbb1d5866a068fe0f6cc87033e52fe139fca5ba1

jdef avatar Oct 20 '15 20:10 jdef

related to #355

jdef avatar Feb 01 '16 23:02 jdef