crun
crun copied to clipboard
Can't create containers if there is a v1 cpuset cgroup with exclusive cores
Steps to reproduce:
- Boot with
systemd.unified_cgroup_hierarchy=0(in my case it's in a VM with 2 cores) - Create a random cgroup
# mkdir /sys/fs/cgroup/cpuset/iamspecial - Assign one of the cores to that cgroup
# echo 1 > /sys/fs/cgroup/cpuset/iamspecial/cpuset.cpus - Make that cpuset exclusive
# echo 1 > /sys/fs/cgroup/cpuset/iamspecial/cpuset.cpu_exclusive # crun spec; mkdir -p rootfs/usr/bin; touch rootfs/usr/bin/sh; crun run test1234
Expected result: open executable: Permission denied
Actual result: write `cpuset.cpus`: Invalid argument
Some short notes and observations:
- Of course, in step 5. you could replace the rootfs with something more proper like a busybox and expect an actual shell being opened.
- For what it's worth,
runcfails the same way, although it's more explicit telling us it failed when writing0-1to/sys/fs/cgroup/cpuset/test1234/cpuset.cpus - using
--systemd-cgrouponly results inwrite `cpuset.cpus`: Permission denied(runcstill getsEINVAL) - adding
"cpu": { "cpus": "0" }in thelinux.resourcessection fixes the issue in most cases
Below is a table, which summarizes all those combinations:
| crun | crun --systemd-cgroup | runc | runc --system-cgroup | |
|---|---|---|---|---|
| default config | EINVAL | EACCES | EINVAL | EINVAL |
| "cpu": { "cpus": "0" } | EINVAL | OK | OK | OK |
So, the main problem here comes from the fact that crun (and runc) tries to initialize newly created cpuset.cpus with value taken from parent cgroup. According to Linux documentation for cgroups v2(!)
An empty value indicates that the cgroup is using the same setting as the nearest cgroup ancestor with a non-empty “cpuset.cpus” or all the available CPUs if none is found.
I cannot find a similar description for cgroups v1. Is that why that initialization is needed? If so, how does one handle cgroups with exclusive cpus? Does one have to traverse the entire cpuset tree to find available cpus? If so, I feel this issue won't be fixed, given cgroups v1 are obsolete anyway.
The secondary problem seems to be that --cgroup-manager=cgroupfs in crun ignores linux.resources.cpu.cpus. Or rather, it will apply them only after initialization happens, as it uses initialize_cpuset_subsystem(). Compare that with --cgroup-manager=systemd, which uses initialize_cpuset_subsystem_resources() and therefore won't attempt putting all cpus in the cpuset.cpus file.