runc
runc copied to clipboard
Support for non-prefix mount for cpuset
On some OS (such as Android), the cpuset cgroup is mounted with "noprefix" option. So there is no "cpuset.xxx" but just "xxx" under cpuset subsystem.
The original issue is here "https://github.com/docker/for-linux/issues/689". Here we can fix it in runc first, then the runc can work with noprefix cpuset at least.
Signed-off-by: Zhongmin Wu [email protected]
I'd prefer (since noprefix
can be set on any cgroup controller) that we fix this in a much more complete manner. In particular, the writeFile
calls in libcontainer/cgroup/fs
should be wrapped in a wrapper which tries both[+] (and you pass in the controller name and the suffix). That way the code will also remain far more understandable than copy-pasting the same change to all writeFile
callers.
Also, please fix up your comments so it says something like // cgroupfs can be mounted with "noprefix" which strips the controller name.
rather than just "on some OS" which doesn't help explain why we need this change.
Thanks.
[+] Ideally, we'd do an fstatfs(2)
to check whether noprefix
was set, but that's not really necessary.
From what I remember working on this -- the write logic should inspect the parent cgroup to understand the hierarchy. I remember it might be possible to successfully write non-working files that don't match the mount's intended prefix.
I will look at my patches for this tomorrow.
The parent hierarchy logic should already be done within the rest of libcontainer/cgroup/fs/cpu*
(though I admit it's been years since I've really touched this code, so I might be mixing this up with memcg
).
So, any more comment about the patch ?
I just tried writing the wrong file ( cpus
on a modern cpuset cgroup mount && cpuset.cpus
on an older android kernel's noprefix mount ).
Both failed permissions as root, so I think this patch is safe to try both writes all the time.
I'm not sure where else the cgroup write behavior for arbitrary files would be documented, but maybe somebody with more experience knows.
[+] Ideally, we'd do an fstatfs(2) to check whether noprefix was set, but that's not really necessary.
^ this would be needed if those writes for the wrong filenames succeeded which is what I was suspicious of.
since noprefix can be set on any cgroup controller
Do we need to do this for more than just the cpuset
controller?
Previous patches I've read from LXC/D only do it for cpuset
IIRC, but if the noprefix
mount option is generic for all cgroups, we should wrap all WriteFile()
calls related to cgroup controller values.
Sure, if such patch is acceptable, I can work for other patches for other cgroup controllers such as mem, cpu ...
ping @cyphar -- opinions on https://github.com/opencontainers/runc/pull/2090#issuecomment-517859203?
Can you get this rebased? Or is there a workaround that accomplishes this as well?
@cyphar is this PR still relevant? is there sthg else to do regarding this comment . If yes I will be happy to continue working on this PR.
I think we have to do the same thing for cgroupv2