Resetting CPU affinity does the opposite on 1024+ CPU systems
Description
runc versions 1.3.0 and earlier allowed container processes to run on any CPU defined in cpuset.cpus (spec.linux.resources.cpu.cpus). From v1.3.1 to currently latest v1.3.3, runc always enforces a CPU affinity. Yet the intention is to set CPU affinity mask that allows using all CPUs, the implementation sets a mask that allows only CPUs 0-1023, effectively disabling running on CPUs 1024 onwards.
The issue was introduced in PR https://github.com/opencontainers/runc/pull/4858 and the problem remains after https://github.com/opencontainers/runc/pull/4926, too.
Steps to reproduce the issue
- Try to start a container that should run on CPUs "1023,1024"
- Check Cpus_allowed_list in /proc/PID/status when the process is running. It is allowed to use only CPU 1023.
Describe the results you received and expected
The container should be allowed to run on all CPUs defined in the spec.
What version of runc are you using?
v1.3.3
Host OS information
No response
Host kernel information
No response
see https://github.com/golang/go/issues/75566
Yeah, this is an unfortunate limitation of golang.org/x/sys/unix. Maybe we should just call the syscall directly...
Actually, beside problem with >1024 CPU is to have at all setting any affinity even if it is not requested. sched_setaffinity() should be called only if explicitly requested by execpuaffinity field.
@kad We need to "unset" the affinity if it was not specified, this was explicitly done in #4858 for a reason.
We've had customers that ran into performance issues because they triggered runc run (or maybe it was via podman) from systemd services that are pinned to a CPU but they don't want their workload to also be pinned to the same CPU by default. (#4815 was opened because of an actual customer issue along those lines.)
If you want to configure a particular CPU pin, you should use the new CPU pinning support we have in runtime-spec 1.4.
@cyphar, what's the new pinning support that you referred to in runtime-spec 1.4? Any pointers to commits/PRs?
Sorry, two mistakes in that comment:
- runtime-spec v1.3, not v1.4 (which doesn't exist yet).
- I thought https://github.com/opencontainers/runtime-spec/pull/1296 was already merged but it isn't. At the moment we only have
execCPUAffinity.
No problem and thanks for clarification, @cyphar!
We'll need to address setting those affinities, too, to support execCPUAffinity on 1024+ CPU systems.
I'd keep the scope of this issue in resetting CPU affinity and prioritize fixing this issue first. This problem affects running every container while using execCPUAffinity is a special case. And this issue prevents using latest runc on, for example, Google's X4 instances with 1440 and 1920 vCPUs today.