runc hang on init when containerd set up
Description
I find some D state process on node which containerd set up
5 D root 14378 13862 0 80 0 - 269979 refrig Sep30 ? 00:00:00 /usr/local/bin/runc init
5 D root 14392 13587 0 80 0 - 270107 refrig Sep30 ? 00:00:00 /usr/local/bin/runc init
0 S root 278169 276735 0 80 0 - 1007 pipe_r 00:44 pts/2 00:00:00 grep --color=auto D
root@hsotname:~# cat /proc/14378/stack
[<0>] __refrigerator+0x4c/0x130
[<0>] unix_stream_data_wait+0x1fa/0x210
[<0>] unix_stream_read_generic+0x50d/0xa60
[<0>] unix_stream_recvmsg+0x88/0x90
[<0>] sock_recvmsg+0x70/0x80
[<0>] sock_read_iter+0x8f/0xf0
[<0>] new_sync_read+0x180/0x190
[<0>] vfs_read+0xff/0x1a0
[<0>] ksys_read+0xb1/0xe0
[<0>] __x64_sys_read+0x19/0x20
[<0>] do_syscall_64+0x5c/0xc0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
root@hostname:~# uptime
01:32:12 up 28 days, 38 min, 2 users, load average: 29.57, 31.53, 31.98
root@hostname:~# systemctl status containerd
● containerd.service - containerd container runtime
Loaded: loaded (/etc/systemd/system/containerd.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2024-09-30 00:53:45 -07; 4 weeks 0 days ago
root@hostname~# ps -eo pid,lstart,cmd,state |grep 14378
14378 Mon Sep 30 00:53:38 2024 /usr/local/bin/runc init D
root@hostname:~# stat /var/containerd/containerd.sock
File: /var/containerd/containerd.sock
Size: 0 Blocks: 0 IO Block: 4096 socket
Device: 10303h/66307d Inode: 1082291752 Links: 1
Access: (0660/srw-rw----) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2024-10-28 00:45:08.361633324 -0700
Modify: 2024-09-30 00:53:45.666038162 -0700
Change: 2024-09-30 00:53:45.666038162 -0700
Birth: 2024-09-30 00:53:45.666038162 -0700
The runc init process set up before /var/containerd/containerd.sock changed. I think there is something race on it? But i think the runc process should wait timeout and exit.
Steps to reproduce the issue
No response
Describe the results you received and expected
The runc init hang. Expected no D state process.
What version of runc are you using?
~# runc --version runc version 1.1.2 commit: c4f88bc9 spec: 1.0.2-dev go: go1.17.13 libseccomp: 2.5.3
Host OS information
~# cat /etc/os-release PRETTY_NAME="Ubuntu 22.04.5 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.5 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy BUILD_ID="ubuntu-240918-061134"
Host kernel information
~# uname -a Linux tess-node-ttbts-tess134.stratus.lvs.ebay.com 5.15.0-26-generic #26 SMP Wed Sep 18 09:16:49 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
/kind bug
runc version 1.1.2
Could you please update runc to v1.1.14 to check whether this situation exists or not? https://github.com/opencontainers/runc/releases/tag/v1.1.14
This probably means runc was killed in the middle of container creation, and thus its child . I barely remember we did something about it, so yes, it makes sense to try latest runc 1.2.0 or a newer 1.1.x release (latest being 1.1.15 ATM).
Being stuck in __refrigerator means that the code is in a frozen cgroupv2 cgroup. I'm pretty sure we had some patches in the past 2 years that fixed this issue?
Being stuck in
__refrigeratormeans that the code is in a frozen cgroupv2 cgroup. I'm pretty sure we had some patches in the past 2 years that fixed this issue?
Right! There were fixes in #3223, but they made it to v1.1.0. We might have some more fixed on top of this though, plus, I guess, someone can freeze a cgroup mid-flight resulting in the same stuck runc init.
@smileusd can you check if cgroups these runc init processes are in are in a frozen state?
Met the same issue with runc 1.1.12 and k3s 1.29.4:
# cat /sys/fs/cgroup/freezer/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podfc602c7f_74d9_4696_bd15_5e3a6433e012.slice/cri-containerd-781fbd52240e017d380aa3cccf42ef379a50c32c703363d0b1e9c1fb10bf17b1.scope/freezer.state
FROZEN
Being stuck in
__refrigeratormeans that the code is in a frozen cgroupv2 cgroup. I'm pretty sure we had some patches in the past 2 years that fixed this issue?Right! There were fixes in #3223, but they made it to v1.1.0. We might have some more fixed on top of this though, plus, I guess, someone can freeze a cgroup mid-flight resulting in the same stuck runc init.
@smileusd can you check if cgroups these
runc initprocesses are in are in a frozen state?
The runc process may be killed because of the context timeout(which is gpc call timeout from kubelet) when it just set FROZEN for the container cgroup, we met this case in host high load situation even if our runc has this fix.
Being stuck in
__refrigeratormeans that the code is in a frozen cgroupv2 cgroup. I'm pretty sure we had some patches in the past 2 years that fixed this issue?Right! There were fixes in #3223, but they made it to v1.1.0. We might have some more fixed on top of this though, plus, I guess, someone can freeze a cgroup mid-flight resulting in the same stuck runc init. @smileusd can you check if cgroups these
runc initprocesses are in are in a frozen state?The runc process may be killed because of the context timeout(which is gpc call timeout from kubelet) when it just set FROZEN for the container cgroup, we met this case in host high load situation even if our runc has this fix.
@kolyshkin runc may need to consider the cgroup FROZEN state when delete a container
Being stuck in
__refrigeratormeans that the code is in a frozen cgroupv2 cgroup. I'm pretty sure we had some patches in the past 2 years that fixed this issue?Right! There were fixes in #3223, but they made it to v1.1.0. We might have some more fixed on top of this though, plus, I guess, someone can freeze a cgroup mid-flight resulting in the same stuck runc init. @smileusd can you check if cgroups these
runc initprocesses are in are in a frozen state?The runc process may be killed because of the context timeout(which is gpc call timeout from kubelet) when it just set FROZEN for the container cgroup, we met this case in host high load situation even if our runc has this fix.
@kolyshkin runc may need to consider the cgroup FROZEN state when delete a container
@kolyshkin If you want to replicate this issue, you can add a time.Sleep command before this line of code, making sure the sleep duration is longer than the context's timeout period.
@kolyshkin runc may need to consider the cgroup FROZEN state when delete a container
@wxx213 if you're talking about runc delete -f here, I believe this was fixed in 2021 by commit 6806b2c1 (PR #3134), which made its way into runc v1.1.0. runc v1.0.x releases do not have this fix.
Is there any update? We've ran into this issue too. runc version is 1.1.12.
We've managed to reproduced this internally by pausing the dbus-daemon process to make a dbus comminication stuck. We applied a simple patch, a timeout added before SetUnitPropertiesContext call, which will give runc init a chance to thaw the cgroup before runc create is killed.
func setUnitProperties(cm *dbusConnManager, name string, properties ...systemdDbus.Property) error {
return cm.retryOnDisconnect(func(c *systemdDbus.Conn) error {
- return c.SetUnitPropertiesContext(context.TODO(), name, true, properties...)
+ const timeout = 10 * time.Second
+ ctx, cancel := context.WithTimeout(context.Background(), timeout)
+ defer cancel()
+ return c.SetUnitPropertiesContext(ctx, name, true, properties...)
})
}
Anyway, we'd like to hear runc teams' official solution and conclusion.
Same issue, waiting for official solution.
We are also noticing this issue of process in D state and increasing the overall CPU load average.
Also noticing the logs where, removal of pod directory is facing an issue.
time="2025-05-08T06:33:05-04:00" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/misc/kubepods/burstable/podd7790b34-7c01-46d6-8ab4-b2d3dd977f99/de9f2b7a7210227c4776edaeeb30750cc8f275f7930b112834c27ba1d2f355c3: device or resource busy"
Came here from this issue https://github.com/kubernetes/kubernetes/issues/123766
I believe https://github.com/opencontainers/runc/pull/4757 has already fixed this issue, although it hasn't been released yet. Would you mind either:
-
Merging this patch and testing it, or
-
Waiting for the version containing this fix to be released?
Met the same issue with runc 1.2.6 and k8s 1.32.6:
I0908 17:46:46.420991 2190871 pod_container_manager_linux.go:210] "Failed to delete cgroup paths" cgroupName=["kubepods","burstable","pod7914ce7c-b79e-4748-bd2f-f99ad718116a"] err="unable to destroy cgroup paths for cgroup [kubepods burstable pod7914ce7c-b79e-4748-bd2f-f99ad718116a] : Timed out while waiting for systemd to remove kubepods-burstable-pod7914ce7c_b79e_4748_bd2f_f99ad718116a.slice"
[root@ning203 ~]# systemctl status kubepods-burstable-pod7914ce7c_b79e_4748_bd2f_f99ad718116a.slice
...
Loaded: loaded (/run/systemd/transient/kubepods-burstable-pod7914ce7c_b79e_4748_bd2f_f99ad718116a.slice; transient)
...
CGroup: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7914ce7c_b79e_4748_bd2f_f99ad718116a.slice
└─cri-containerd-db63ee0795a5850760c67c701d662183ae09b0e1df6a40a54f42717feaeb53bb.scope
└─ 2191866 runc init
[root@ning203 ~]# cat /proc/2191866/stat
2191866 (runc:[2:INIT]) D 1 2191866 2191866 0 -1 4260160 743 0 0 0 0 1 0 0 20 0 6 0 27417423 1646030848 3261 18446744073709551615 93872309833728 93872314901781 140737326215664 0 0 256 0 0 2143420159 0 0 0 17 7 0 0 0 0 0 93872315794280 93872320913697 93872323584000 140737326223159 140737326223169 140737326223169 140737326223336 0
This should be fixed in runc v1.3.1 and newer (v1.3.2, v1.4.0-rc.1). Please let us know if not.