otp icon indicating copy to clipboard operation
otp copied to clipboard

cgroup v2 support is not finding the corrects files

Open RoadRunnr opened this issue 1 year ago • 5 comments

Describe the bug

The cgroup v2 code attempts use files from the root cgroup that are not available. Instead is should find the processes child cgroup and read that. The code first reads /proc/self/mountinfo to find the cgroup mount point. With cgroups v2, that mount point will contain the root cgroup. It them attempt to read cpu.max which according to https://docs.kernel.org/admin-guide/cgroup-v2.html, only exists in child cgroups, so it can neven at the mount path for the root cgroup. It also reads cgroup.controllers which is a per group setting, so it should be read that from the group path as well.

What it should do instead is to read /proc/self/cgroup and append the path in there to the mount point to get the child cgroup.

Sample from kubernetes pod:

/ # cat /proc/self/mountinfo | grep cgroup
6580 6561 0:29 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime - cgroup2 cgroup rw
/ # ls /sys/fs/cgroup/cpu.max
ls: /sys/fs/cgroup/cpu.max: No such file or directory
/ # cat /proc/self/cgroup 
0::/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda9ad8f2a_f2bb_4705_adf5_9e879ab1e906.slice/cri-containerd-bba15c286d5c44f8374f10c3a6cb7e7ae81db724eff4cf001ad28d2d6485f0da.scope
/ # cat /proc/self/cgroup 
0::/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda9ad8f2a_f2bb_4705_adf5_9e879ab1e906.slice/cri-containerd-bba15c286d5c44f8374f10c3a6cb7e7ae81db724eff4cf001ad28d2d6485f0da.scope
/ # cat /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda9ad8f2a_f2bb_4705_adf5_9e879ab1e906.slice/cri-containerd-bba15c286d5c44f8374f10c3a6cb7e7ae81db724eff4cf001ad28d2d6485f0da.scope/cpu.max
800000 100000

Note: It would seem that the code to read cpu.max is also broken. The scanf format expects two integers, but the first value can be the string max.

To Reproduce

Steps to reproduce the behavior.

  • check that Erlang tries to (unsuccessfully) read /sys/fs/cgroup/cpu.max with e.g. strace:
$ strace -o /tmp/f1 -f erl -s init stop
Erlang/OTP 26 [erts-14.1.1] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit:ns]

$ grep /sys/fs/cgroup /tmp/f1
19270 openat(AT_FDCWD, "/sys/fs/cgroup/cgroup.controllers", O_RDONLY) = 4
19270 openat(AT_FDCWD, "/sys/fs/cgroup/cpu.max", O_RDONLY) = -1 ENOENT (No such file or directory)
19270 openat(AT_FDCWD, "/sys/fs/cgroup/cgroup.controllers", O_RDONLY) = 4
19270 openat(AT_FDCWD, "/sys/fs/cgroup/cpu.max", O_RDONLY) = -1 ENOENT (No such file or directory)
19270 openat(AT_FDCWD, "/sys/fs/cgroup/cgroup.controllers", O_RDONLY) = 13
19270 openat(AT_FDCWD, "/sys/fs/cgroup/cpu.max", O_RDONLY) = -1 ENOENT (No such file or directory)
  • check cgroups setup, e.g.
$ cat /proc/self/mountinfo | grep cgroup
35 24 0:30 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime shared:10 - cgroup2 cgroup2 rw,nsdelegate,memory_recursiveprot
$ cat /proc/self/cgroup 
0::/user.slice/user-1000.slice/session-c2.scope
$ cat /sys/fs/cgroup/user.slice/user-1000.slice/session-c2.scope/cpu.max
max 100000

Expected behavior

Erlang should read the cgroup v2 information from the per process slice found in /proc/self/cgroup

Affected versions

verified on OTP-25.3 and OTP-26.1

RoadRunnr avatar Dec 01 '23 16:12 RoadRunnr