otp
otp copied to clipboard
cgroup v2 support is not finding the corrects files
Describe the bug
The cgroup v2 code attempts use files from the root cgroup that are not available. Instead is should find the processes child cgroup and read that.
The code first reads /proc/self/mountinfo
to find the cgroup mount point. With cgroups v2, that mount point will contain the root cgroup. It them attempt to read cpu.max
which according to https://docs.kernel.org/admin-guide/cgroup-v2.html, only exists in child cgroups, so it can neven at the mount path for the root cgroup. It also reads cgroup.controllers
which is a per group setting, so it should be read that from the group path as well.
What it should do instead is to read /proc/self/cgroup
and append the path in there to the mount point to get the child cgroup.
Sample from kubernetes pod:
/ # cat /proc/self/mountinfo | grep cgroup
6580 6561 0:29 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime - cgroup2 cgroup rw
/ # ls /sys/fs/cgroup/cpu.max
ls: /sys/fs/cgroup/cpu.max: No such file or directory
/ # cat /proc/self/cgroup
0::/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda9ad8f2a_f2bb_4705_adf5_9e879ab1e906.slice/cri-containerd-bba15c286d5c44f8374f10c3a6cb7e7ae81db724eff4cf001ad28d2d6485f0da.scope
/ # cat /proc/self/cgroup
0::/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda9ad8f2a_f2bb_4705_adf5_9e879ab1e906.slice/cri-containerd-bba15c286d5c44f8374f10c3a6cb7e7ae81db724eff4cf001ad28d2d6485f0da.scope
/ # cat /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda9ad8f2a_f2bb_4705_adf5_9e879ab1e906.slice/cri-containerd-bba15c286d5c44f8374f10c3a6cb7e7ae81db724eff4cf001ad28d2d6485f0da.scope/cpu.max
800000 100000
Note: It would seem that the code to read cpu.max is also broken. The scanf format expects two integers, but the first value can be the string max
.
To Reproduce
Steps to reproduce the behavior.
- check that Erlang tries to (unsuccessfully) read /sys/fs/cgroup/cpu.max with e.g. strace:
$ strace -o /tmp/f1 -f erl -s init stop
Erlang/OTP 26 [erts-14.1.1] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit:ns]
$ grep /sys/fs/cgroup /tmp/f1
19270 openat(AT_FDCWD, "/sys/fs/cgroup/cgroup.controllers", O_RDONLY) = 4
19270 openat(AT_FDCWD, "/sys/fs/cgroup/cpu.max", O_RDONLY) = -1 ENOENT (No such file or directory)
19270 openat(AT_FDCWD, "/sys/fs/cgroup/cgroup.controllers", O_RDONLY) = 4
19270 openat(AT_FDCWD, "/sys/fs/cgroup/cpu.max", O_RDONLY) = -1 ENOENT (No such file or directory)
19270 openat(AT_FDCWD, "/sys/fs/cgroup/cgroup.controllers", O_RDONLY) = 13
19270 openat(AT_FDCWD, "/sys/fs/cgroup/cpu.max", O_RDONLY) = -1 ENOENT (No such file or directory)
- check cgroups setup, e.g.
$ cat /proc/self/mountinfo | grep cgroup
35 24 0:30 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:10 - cgroup2 cgroup2 rw,nsdelegate,memory_recursiveprot
$ cat /proc/self/cgroup
0::/user.slice/user-1000.slice/session-c2.scope
$ cat /sys/fs/cgroup/user.slice/user-1000.slice/session-c2.scope/cpu.max
max 100000
Expected behavior
Erlang should read the cgroup v2 information from the per process slice found in /proc/self/cgroup
Affected versions
verified on OTP-25.3 and OTP-26.1