runc icon indicating copy to clipboard operation
runc copied to clipboard

Resetting CPU affinity does the opposite on 1024+ CPU systems

Open askervin opened this issue 1 month ago • 7 comments

Description

runc versions 1.3.0 and earlier allowed container processes to run on any CPU defined in cpuset.cpus (spec.linux.resources.cpu.cpus). From v1.3.1 to currently latest v1.3.3, runc always enforces a CPU affinity. Yet the intention is to set CPU affinity mask that allows using all CPUs, the implementation sets a mask that allows only CPUs 0-1023, effectively disabling running on CPUs 1024 onwards.

The issue was introduced in PR https://github.com/opencontainers/runc/pull/4858 and the problem remains after https://github.com/opencontainers/runc/pull/4926, too.

Steps to reproduce the issue

  1. Try to start a container that should run on CPUs "1023,1024"
  2. Check Cpus_allowed_list in /proc/PID/status when the process is running. It is allowed to use only CPU 1023.

Describe the results you received and expected

The container should be allowed to run on all CPUs defined in the spec.

What version of runc are you using?

v1.3.3

Host OS information

No response

Host kernel information

No response

askervin avatar Nov 18 '25 09:11 askervin

see https://github.com/golang/go/issues/75566

ningmingxiao avatar Nov 18 '25 11:11 ningmingxiao

Yeah, this is an unfortunate limitation of golang.org/x/sys/unix. Maybe we should just call the syscall directly...

cyphar avatar Nov 18 '25 12:11 cyphar

Actually, beside problem with >1024 CPU is to have at all setting any affinity even if it is not requested. sched_setaffinity() should be called only if explicitly requested by execpuaffinity field.

kad avatar Nov 18 '25 14:11 kad

@kad We need to "unset" the affinity if it was not specified, this was explicitly done in #4858 for a reason.

We've had customers that ran into performance issues because they triggered runc run (or maybe it was via podman) from systemd services that are pinned to a CPU but they don't want their workload to also be pinned to the same CPU by default. (#4815 was opened because of an actual customer issue along those lines.)

If you want to configure a particular CPU pin, you should use the new CPU pinning support we have in runtime-spec 1.4.

cyphar avatar Nov 19 '25 06:11 cyphar

@cyphar, what's the new pinning support that you referred to in runtime-spec 1.4? Any pointers to commits/PRs?

askervin avatar Nov 19 '25 14:11 askervin

Sorry, two mistakes in that comment:

  1. runtime-spec v1.3, not v1.4 (which doesn't exist yet).
  2. I thought https://github.com/opencontainers/runtime-spec/pull/1296 was already merged but it isn't. At the moment we only have execCPUAffinity.

cyphar avatar Nov 19 '25 23:11 cyphar

No problem and thanks for clarification, @cyphar!

We'll need to address setting those affinities, too, to support execCPUAffinity on 1024+ CPU systems.

I'd keep the scope of this issue in resetting CPU affinity and prioritize fixing this issue first. This problem affects running every container while using execCPUAffinity is a special case. And this issue prevents using latest runc on, for example, Google's X4 instances with 1440 and 1920 vCPUs today.

askervin avatar Nov 20 '25 06:11 askervin