Introduce CPUAffinity process property instead of execCPUAffinity
This change introduces more generic CPUAffinity property of Process to specify desired CPU affinities while performing operations on create, start and exec operations.
As it was originally discussed in PR #1253, the existing implementation covers only exec usecase, where setting affinity for OCI hooks and initial container process will benefit wider set of workloads.
@kolyshkin PTAL
Given issues like https://github.com/golang/sys/pull/259, I think it might be nice to have a way to indicate "all CPUs" as a way to reset affinity (but without doing 0-1024, which is ~300x slower than the memset approach). In runc we implicitly do this now, but users might want to reset affinity for other stages explicitly.
Given issues like golang/sys#259, I think it might be nice to have a way to indicate "all CPUs" as a way to reset affinity (but without doing
0-1024, which is ~300x slower than thememsetapproach). In runc we implicitly do this now, but users might want to reset affinity for other stages explicitly.
theoretically, the higher-level runtime that generates OCI spec might be feeling it with right number based on detected system information. e.g. use result of sched_getaffinity(2) of parent process
fyi @bitoku
@kad
theoretically, the higher-level runtime that generates OCI spec might be feeling it with right number based on detected system information. e.g. use result of sched_getaffinity(2) of parent process
My experience is that this rarely happens -- usually higher-level runtimes either hide new knobs like this (requiring you to specify patch config.json through experimental or unsupported hacks) or they transparently forward the values to the lower-level runtime without adding new functionality. Even if they do implement it, there is no guarantee that the behaviour or syntax will be standardised between runtimes, which leads to more problems than it solves.
Given that we have had seen practical issues with container runtimes being spawned with suboptimal CPU affinity values, I would suggest that having an "all" or "max" special value would be a good idea.
Also you do not need to detect anything with sched_getaffinity(2) for this -- you just need to memset(&cpuset, 0xFF, sizeof(cpuset)) to reset the affinity to the maximum possible value. In fact, you don't want to use sched_getaffinity(2) or Go's runtime.NumCpu because they give you the current affinity which is precisely the value you don't want.
@kad
theoretically, the higher-level runtime that generates OCI spec might be feeling it with right number based on detected system information. e.g. use result of sched_getaffinity(2) of parent process
My experience is that this rarely happens -- usually higher-level runtimes either hide new knobs like this (requiring you to specify patch
config.jsonthrough experimental or unsupported hacks) or they transparently forward the values to the lower-level runtime without adding new functionality. Even if they do implement it, there is no guarantee that the behaviour or syntax will be standardised between runtimes, which leads to more problems than it solves.
my point was that upper level runtime might be pinned to subset of CPUs (e.g. "infra reserved" partition of the system). OCI runtime when spawned from it, inherits affinity from parent process, thus sched_getaffinity early at OCI runtime start should have real good value of parent runtime.
Given that we have had seen practical issues with container runtimes being spawned with suboptimal CPU affinity values, I would suggest that having an "all" or "max" special value would be a good idea.
I don't mind adding special value all, I see it valuable, would update PR.
However, I'm thinking of additional special values, e.g. default which will be result sched_getaffinity to reset to inherited default affinity? Maybe online that would be same as content in /sys/devices/system/cpu/online, but suspect that in some scenarios sysfs might be not always available....
Also you do not need to detect anything with
sched_getaffinity(2)for this -- you just need tomemset(&cpuset, 0xFF, sizeof(cpuset))to reset the affinity to the maximum possible value. In fact, you don't want to usesched_getaffinity(2)or Go'sruntime.NumCpubecause they give you the current affinity which is precisely the value you don't want.
having all bits set is also might be considered unwanted. As I mentioned, there are setups where runtimes containerd/cri-o are restricted to subset of CPUs, and in those setups it might be unwanted if lower layer OCI runtimes would be using CPU time from other cores.
A naive question -- is there a use case when we want different CPU affinities for different OCI hooks?
@kolyshkin I think the point is to have the affinity change at different stages rather than it be hook-specific -- hooks will probably inherit them but hooks can also set their own affinity if they want to. The naming is similar to hooks because both correspond to runtime lifecycle stages.
@kolyshkin @giuseppe @cyphar update pushed with following changes, as suggested:
- formating of Markdown fixed
- introduced special value
all - names of fields renamed to match hooks naming schema
- codespell fixes (sigh)