crun icon indicating copy to clipboard operation
crun copied to clipboard

Can't create containers if there is a v1 cpuset cgroup with exclusive cores

Open michalsieron opened this issue 11 months ago • 3 comments
trafficstars

Steps to reproduce:

  1. Boot with systemd.unified_cgroup_hierarchy=0 (in my case it's in a VM with 2 cores)
  2. Create a random cgroup # mkdir /sys/fs/cgroup/cpuset/iamspecial
  3. Assign one of the cores to that cgroup # echo 1 > /sys/fs/cgroup/cpuset/iamspecial/cpuset.cpus
  4. Make that cpuset exclusive # echo 1 > /sys/fs/cgroup/cpuset/iamspecial/cpuset.cpu_exclusive
  5. # crun spec; mkdir -p rootfs/usr/bin; touch rootfs/usr/bin/sh; crun run test1234

Expected result: open executable: Permission denied Actual result: write `cpuset.cpus`: Invalid argument


Some short notes and observations:

  • Of course, in step 5. you could replace the rootfs with something more proper like a busybox and expect an actual shell being opened.
  • For what it's worth, runc fails the same way, although it's more explicit telling us it failed when writing 0-1 to /sys/fs/cgroup/cpuset/test1234/cpuset.cpus
  • using --systemd-cgroup only results in write `cpuset.cpus`: Permission denied (runc still gets EINVAL)
  • adding "cpu": { "cpus": "0" } in the linux.resources section fixes the issue in most cases

Below is a table, which summarizes all those combinations:

crun crun --systemd-cgroup runc runc --system-cgroup
default config EINVAL EACCES EINVAL EINVAL
"cpu": { "cpus": "0" } EINVAL OK OK OK

So, the main problem here comes from the fact that crun (and runc) tries to initialize newly created cpuset.cpus with value taken from parent cgroup. According to Linux documentation for cgroups v2(!)

An empty value indicates that the cgroup is using the same setting as the nearest cgroup ancestor with a non-empty “cpuset.cpus” or all the available CPUs if none is found.

I cannot find a similar description for cgroups v1. Is that why that initialization is needed? If so, how does one handle cgroups with exclusive cpus? Does one have to traverse the entire cpuset tree to find available cpus? If so, I feel this issue won't be fixed, given cgroups v1 are obsolete anyway.

The secondary problem seems to be that --cgroup-manager=cgroupfs in crun ignores linux.resources.cpu.cpus. Or rather, it will apply them only after initialization happens, as it uses initialize_cpuset_subsystem(). Compare that with --cgroup-manager=systemd, which uses initialize_cpuset_subsystem_resources() and therefore won't attempt putting all cpus in the cpuset.cpus file.

michalsieron avatar Dec 13 '24 11:12 michalsieron