runc icon indicating copy to clipboard operation
runc copied to clipboard

Support for non-prefix mount for cpuset

Open sophy228 opened this issue 5 years ago • 10 comments

On some OS (such as Android), the cpuset cgroup is mounted with "noprefix" option. So there is no "cpuset.xxx" but just "xxx" under cpuset subsystem.

The original issue is here "https://github.com/docker/for-linux/issues/689". Here we can fix it in runc first, then the runc can work with noprefix cpuset at least.

Signed-off-by: Zhongmin Wu [email protected]

sophy228 avatar Jul 30 '19 01:07 sophy228

I'd prefer (since noprefix can be set on any cgroup controller) that we fix this in a much more complete manner. In particular, the writeFile calls in libcontainer/cgroup/fs should be wrapped in a wrapper which tries both[+] (and you pass in the controller name and the suffix). That way the code will also remain far more understandable than copy-pasting the same change to all writeFile callers.

Also, please fix up your comments so it says something like // cgroupfs can be mounted with "noprefix" which strips the controller name. rather than just "on some OS" which doesn't help explain why we need this change.

Thanks.

[+] Ideally, we'd do an fstatfs(2) to check whether noprefix was set, but that's not really necessary.

cyphar avatar Jul 30 '19 04:07 cyphar

From what I remember working on this -- the write logic should inspect the parent cgroup to understand the hierarchy. I remember it might be possible to successfully write non-working files that don't match the mount's intended prefix.

I will look at my patches for this tomorrow.

stealthybox avatar Jul 31 '19 05:07 stealthybox

The parent hierarchy logic should already be done within the rest of libcontainer/cgroup/fs/cpu* (though I admit it's been years since I've really touched this code, so I might be mixing this up with memcg).

cyphar avatar Jul 31 '19 07:07 cyphar

So, any more comment about the patch ?

sophy228 avatar Aug 02 '19 00:08 sophy228

I just tried writing the wrong file ( cpus on a modern cpuset cgroup mount && cpuset.cpus on an older android kernel's noprefix mount ). Both failed permissions as root, so I think this patch is safe to try both writes all the time.

I'm not sure where else the cgroup write behavior for arbitrary files would be documented, but maybe somebody with more experience knows.

[+] Ideally, we'd do an fstatfs(2) to check whether noprefix was set, but that's not really necessary.

^ this would be needed if those writes for the wrong filenames succeeded which is what I was suspicious of.

stealthybox avatar Aug 02 '19 22:08 stealthybox

since noprefix can be set on any cgroup controller

Do we need to do this for more than just the cpuset controller? Previous patches I've read from LXC/D only do it for cpuset IIRC, but if the noprefix mount option is generic for all cgroups, we should wrap all WriteFile() calls related to cgroup controller values.

stealthybox avatar Aug 02 '19 22:08 stealthybox

Sure, if such patch is acceptable, I can work for other patches for other cgroup controllers such as mem, cpu ...

sophy228 avatar Aug 05 '19 00:08 sophy228

ping @cyphar -- opinions on https://github.com/opencontainers/runc/pull/2090#issuecomment-517859203?

stealthybox avatar Aug 12 '19 17:08 stealthybox

Can you get this rebased? Or is there a workaround that accomplishes this as well?

satmandu avatar Jan 11 '23 17:01 satmandu

@cyphar is this PR still relevant? is there sthg else to do regarding this comment . If yes I will be happy to continue working on this PR. I think we have to do the same thing for cgroupv2

fahedouch avatar Mar 08 '23 16:03 fahedouch