gvisor EINVAL writing cgroup.procs when creating container

Description

We're seeing a rare issue where creating the gvisor container fails when it tries to join the cgroup it just created, getting back EINVAL from the kernel when it tries to write its process ID to the cgroup hierarchy. The only related thing we've come across is a ticket from opencontainers/runc#1884 but we don't have enough hits to reproduce it or verify that it is related.

Is there another situation in which the kernel would return EINVAL when a process tries to join a cgroup? I doubt the fix that runc implemented (using a C library shim that runs before the Go runtime starts) is an acceptable fix here, but maybe a simple retry would be enough?

This is on runsc release-20220905, host kernel is 5.15

Steps to reproduce

No response

runsc version

`runsc version release-20220905.0

docker version (if using docker)

No response

uname

No response

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

No response

Sep 19 '22 21:09 jseba

Hi @jseba, are you able to get any runsc logs from the failing containers? That could help in diagnosing the exact issue. Also, does the issue happen more frequently with high exec load like in other bug?

Oct 05 '22 20:10 manninglucas

If there's a way to get debug logs only when the container is created and not while it executes, I might be able to get something. Otherwise we simply don't have the storage space for these logs as this issue hits only once or twice a month even with as many containers as we launch.

Oct 10 '22 15:10 jseba

@jseba, please patch #8076 and add --debug-command=create (or --debug-command=!boot,gofer,events to remove the more verbose logs) to the runtime options to restrict what commands are logged.

Oct 11 '22 19:10 fvoznika

Okay, finally managed to get some debug logs from the create command, nothing obvious stands out to me. This is from version release-20230214.0

D0223 13:14:52.814844  3874296 container.go:185] Create container, cid: service_7c5c93f7-9cc9-3811-e398-ef3d2374ce89_e75e9298, rootDir: "/var/run/runsc"
D0223 13:14:52.814995  3874296 container.go:244] Creating new sandbox for container, cid: service_7c5c93f7-9cc9-3811-e398-ef3d2374ce89_e75e9298
D0223 13:14:52.815636  3874296 cgroup.go:410] New cgroup for pid: self, *cgroup.cgroupV2: &{Mountpoint:/sys/fs/cgroup Path:containers/service_group/service_7c5c93f7-9cc9-3811-e398-ef3d2374ce89_e75e9298/gv Controllers:[cpuset cpu io memory hugetlb pids misc] Own:[]}
D0223 13:14:52.815685  3874296 cgroup_v2.go:129] Installing cgroup path "/sys/fs/cgroup/containers/service_group/service_7c5c93f7-9cc9-3811-e398-ef3d2374ce89_e75e9298/gv"
D0223 13:14:52.835990  3874296 cgroup.go:116] Setting "/sys/fs/cgroup/containers/service_group/service_7c5c93f7-9cc9-3811-e398-ef3d2374ce89_e75e9298/gv/cpu.max" to "43478 10000"
D0223 13:14:52.836277  3874296 cgroup.go:116] Setting "/sys/fs/cgroup/containers/service_group/service_7c5c93f7-9cc9-3811-e398-ef3d2374ce89_e75e9298/gv/memory.max" to "1887436800"
D0223 13:14:52.836337  3874296 cgroup.go:116] Setting "/sys/fs/cgroup/containers/service_group/service_7c5c93f7-9cc9-3811-e398-ef3d2374ce89_e75e9298/gv/pids.max" to "4096"
D0223 13:14:52.837098  3874296 cgroup.go:116] Setting "/sys/fs/cgroup/containers/service_group/service_7c5c93f7-9cc9-3811-e398-ef3d2374ce89_e75e9298/gv/cgroup.procs" to "0"
D0223 13:14:52.837291  3874296 cgroup_v2.go:215] Restoring cgroup "/sys/fs/cgroup/containers/service_group/service_7c5c93f7-9cc9-3811-e398-ef3d2374ce89_e75e9298/cc"
D0223 13:14:52.837332  3874296 cgroup.go:116] Setting "/sys/fs/cgroup/containers/service_group/service_7c5c93f7-9cc9-3811-e398-ef3d2374ce89_e75e9298/mgmt/cgroup.procs" to "0"
D0223 13:14:52.837945  3874296 container.go:723] Destroy container, cid: service_7c5c93f7-9cc9-3811-e398-ef3d2374ce89_e75e9298
W0223 13:14:52.838090  3874296 util.go:64] FATAL ERROR: creating container: write /sys/fs/cgroup/containers/service_group/service_7c5c93f7-9cc9-3811-e398-ef3d2374ce89_e75e9298/gv/cgroup.procs: invalid argument
W0223 13:14:52.838198  3874296 main.go:261] Failure to execute command, err: 1

Some context on our cgroup layout: we have a top-level cgroup for all containers containers, which has resource constraints for all containers running on the metals combined. Then there's a cgroup for all containers that make up a service group to limit the entire group relative to others, service_group. Below this is where the actual containers go, and each container has two cgroups due to the cgroupv2 leaf restrictions around no processes in non-leaf nodes of the tree. There's one for the management processes/sidecars, mgmt, and one for the gvisor sentry and gofer, gv. We don't enable cgroup threading, so all accounting should be by process.

Let me know if there's anything more I can provide or if there's a patch you'd like me to apply and try.

Feb 27 '23 17:02 jseba

A friendly reminder that this issue had no activity for 120 days.

Sep 13 '23 00:09 github-actions[bot]

The only related thing we've come across is a ticket from opencontainers/runc#1884 but we don't have enough hits to reproduce it or verify that it is related.

It can be the root cause of this issue. btw: it was fixed in the kernel: https://github.com/torvalds/linux/commit/df16b71c686cb096774e30153c9ce6756450796c

Sep 13 '23 23:09 avagin

A friendly reminder that this issue had no activity for 120 days.

Jan 12 '24 00:01 github-actions[bot]

This issue has been closed due to lack of activity.

Apr 11 '24 00:04 github-actions[bot]

gvisor gvisor copied to clipboard

EINVAL writing cgroup.procs when creating container

Description

Steps to reproduce

runsc version

docker version (if using docker)

uname

kubectl (if using Kubernetes)

repo state (if built from source)

runsc debug logs (if available)

gvisor
gvisor copied to clipboard