runc icon indicating copy to clipboard operation
runc copied to clipboard

runc hang on init when containerd set up

Open smileusd opened this issue 1 year ago • 17 comments

Description

I find some D state process on node which containerd set up

5 D root      14378  13862  0  80   0 - 269979 refrig Sep30 ?       00:00:00 /usr/local/bin/runc init
5 D root      14392  13587  0  80   0 - 270107 refrig Sep30 ?       00:00:00 /usr/local/bin/runc init
0 S root     278169 276735  0  80   0 -  1007 pipe_r 00:44 pts/2    00:00:00 grep --color=auto  D 
root@hsotname:~# cat /proc/14378/stack 
[<0>] __refrigerator+0x4c/0x130
[<0>] unix_stream_data_wait+0x1fa/0x210
[<0>] unix_stream_read_generic+0x50d/0xa60
[<0>] unix_stream_recvmsg+0x88/0x90
[<0>] sock_recvmsg+0x70/0x80
[<0>] sock_read_iter+0x8f/0xf0
[<0>] new_sync_read+0x180/0x190
[<0>] vfs_read+0xff/0x1a0
[<0>] ksys_read+0xb1/0xe0
[<0>] __x64_sys_read+0x19/0x20
[<0>] do_syscall_64+0x5c/0xc0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
root@hostname:~# uptime
 01:32:12 up 28 days, 38 min,  2 users,  load average: 29.57, 31.53, 31.98
root@hostname:~# systemctl status containerd
● containerd.service - containerd container runtime
     Loaded: loaded (/etc/systemd/system/containerd.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2024-09-30 00:53:45 -07; 4 weeks 0 days ago
root@hostname~# ps -eo pid,lstart,cmd,state |grep 14378
 14378 Mon Sep 30 00:53:38 2024 /usr/local/bin/runc init    D
root@hostname:~# stat /var/containerd/containerd.sock
  File: /var/containerd/containerd.sock
  Size: 0               Blocks: 0          IO Block: 4096   socket
Device: 10303h/66307d   Inode: 1082291752  Links: 1
Access: (0660/srw-rw----)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2024-10-28 00:45:08.361633324 -0700
Modify: 2024-09-30 00:53:45.666038162 -0700
Change: 2024-09-30 00:53:45.666038162 -0700
 Birth: 2024-09-30 00:53:45.666038162 -0700

The runc init process set up before /var/containerd/containerd.sock changed. I think there is something race on it? But i think the runc process should wait timeout and exit.

Steps to reproduce the issue

No response

Describe the results you received and expected

The runc init hang. Expected no D state process.

What version of runc are you using?

~# runc --version runc version 1.1.2 commit: c4f88bc9 spec: 1.0.2-dev go: go1.17.13 libseccomp: 2.5.3

Host OS information

~# cat /etc/os-release PRETTY_NAME="Ubuntu 22.04.5 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.5 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy BUILD_ID="ubuntu-240918-061134"

Host kernel information

~# uname -a Linux tess-node-ttbts-tess134.stratus.lvs.ebay.com 5.15.0-26-generic #26 SMP Wed Sep 18 09:16:49 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

smileusd avatar Oct 28 '24 08:10 smileusd

/kind bug

smileusd avatar Oct 28 '24 08:10 smileusd

runc version 1.1.2

Could you please update runc to v1.1.14 to check whether this situation exists or not? https://github.com/opencontainers/runc/releases/tag/v1.1.14

lifubang avatar Oct 28 '24 09:10 lifubang

This probably means runc was killed in the middle of container creation, and thus its child . I barely remember we did something about it, so yes, it makes sense to try latest runc 1.2.0 or a newer 1.1.x release (latest being 1.1.15 ATM).

kolyshkin avatar Oct 28 '24 22:10 kolyshkin

Being stuck in __refrigerator means that the code is in a frozen cgroupv2 cgroup. I'm pretty sure we had some patches in the past 2 years that fixed this issue?

cyphar avatar Oct 29 '24 17:10 cyphar

Being stuck in __refrigerator means that the code is in a frozen cgroupv2 cgroup. I'm pretty sure we had some patches in the past 2 years that fixed this issue?

Right! There were fixes in #3223, but they made it to v1.1.0. We might have some more fixed on top of this though, plus, I guess, someone can freeze a cgroup mid-flight resulting in the same stuck runc init.

@smileusd can you check if cgroups these runc init processes are in are in a frozen state?

kolyshkin avatar Oct 29 '24 23:10 kolyshkin

Met the same issue with runc 1.1.12 and k3s 1.29.4:

# cat /sys/fs/cgroup/freezer/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podfc602c7f_74d9_4696_bd15_5e3a6433e012.slice/cri-containerd-781fbd52240e017d380aa3cccf42ef379a50c32c703363d0b1e9c1fb10bf17b1.scope/freezer.state
FROZEN

cheese avatar Nov 01 '24 10:11 cheese

Being stuck in __refrigerator means that the code is in a frozen cgroupv2 cgroup. I'm pretty sure we had some patches in the past 2 years that fixed this issue?

Right! There were fixes in #3223, but they made it to v1.1.0. We might have some more fixed on top of this though, plus, I guess, someone can freeze a cgroup mid-flight resulting in the same stuck runc init.

@smileusd can you check if cgroups these runc init processes are in are in a frozen state?

The runc process may be killed because of the context timeout(which is gpc call timeout from kubelet) when it just set FROZEN for the container cgroup, we met this case in host high load situation even if our runc has this fix.

wxx213 avatar Nov 11 '24 11:11 wxx213

Being stuck in __refrigerator means that the code is in a frozen cgroupv2 cgroup. I'm pretty sure we had some patches in the past 2 years that fixed this issue?

Right! There were fixes in #3223, but they made it to v1.1.0. We might have some more fixed on top of this though, plus, I guess, someone can freeze a cgroup mid-flight resulting in the same stuck runc init. @smileusd can you check if cgroups these runc init processes are in are in a frozen state?

The runc process may be killed because of the context timeout(which is gpc call timeout from kubelet) when it just set FROZEN for the container cgroup, we met this case in host high load situation even if our runc has this fix.

@kolyshkin runc may need to consider the cgroup FROZEN state when delete a container

wxx213 avatar Nov 11 '24 11:11 wxx213

Being stuck in __refrigerator means that the code is in a frozen cgroupv2 cgroup. I'm pretty sure we had some patches in the past 2 years that fixed this issue?

Right! There were fixes in #3223, but they made it to v1.1.0. We might have some more fixed on top of this though, plus, I guess, someone can freeze a cgroup mid-flight resulting in the same stuck runc init. @smileusd can you check if cgroups these runc init processes are in are in a frozen state?

The runc process may be killed because of the context timeout(which is gpc call timeout from kubelet) when it just set FROZEN for the container cgroup, we met this case in host high load situation even if our runc has this fix.

@kolyshkin runc may need to consider the cgroup FROZEN state when delete a container

@kolyshkin If you want to replicate this issue, you can add a time.Sleep command before this line of code, making sure the sleep duration is longer than the context's timeout period.

jianghao65536 avatar Nov 11 '24 11:11 jianghao65536

@kolyshkin runc may need to consider the cgroup FROZEN state when delete a container

@wxx213 if you're talking about runc delete -f here, I believe this was fixed in 2021 by commit 6806b2c1 (PR #3134), which made its way into runc v1.1.0. runc v1.0.x releases do not have this fix.

kolyshkin avatar Dec 12 '24 23:12 kolyshkin

Is there any update? We've ran into this issue too. runc version is 1.1.12.

linsite avatar Feb 08 '25 09:02 linsite

We've managed to reproduced this internally by pausing the dbus-daemon process to make a dbus comminication stuck. We applied a simple patch, a timeout added before SetUnitPropertiesContext call, which will give runc init a chance to thaw the cgroup before runc create is killed.

func setUnitProperties(cm *dbusConnManager, name string, properties ...systemdDbus.Property) error {
 	return cm.retryOnDisconnect(func(c *systemdDbus.Conn) error {
-		return c.SetUnitPropertiesContext(context.TODO(), name, true, properties...)
+		const timeout = 10 * time.Second
+		ctx, cancel := context.WithTimeout(context.Background(), timeout)
+		defer cancel()
+		return c.SetUnitPropertiesContext(ctx, name, true, properties...)
 	})
 }

Anyway, we'd like to hear runc teams' official solution and conclusion.

linsite avatar Feb 26 '25 07:02 linsite

Same issue, waiting for official solution.

ayetkin avatar May 05 '25 08:05 ayetkin

We are also noticing this issue of process in D state and increasing the overall CPU load average.

Also noticing the logs where, removal of pod directory is facing an issue.

time="2025-05-08T06:33:05-04:00" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/misc/kubepods/burstable/podd7790b34-7c01-46d6-8ab4-b2d3dd977f99/de9f2b7a7210227c4776edaeeb30750cc8f275f7930b112834c27ba1d2f355c3: device or resource busy"

Came here from this issue https://github.com/kubernetes/kubernetes/issues/123766

srajappa avatar May 19 '25 15:05 srajappa

I believe https://github.com/opencontainers/runc/pull/4757 has already fixed this issue, although it hasn't been released yet. Would you mind either:

  • Merging this patch and testing it, or

  • Waiting for the version containing this fix to be released?

HirazawaUi avatar Jun 21 '25 16:06 HirazawaUi

Met the same issue with runc 1.2.6 and k8s 1.32.6:

I0908 17:46:46.420991 2190871 pod_container_manager_linux.go:210] "Failed to delete cgroup paths" cgroupName=["kubepods","burstable","pod7914ce7c-b79e-4748-bd2f-f99ad718116a"] err="unable to destroy cgroup paths for cgroup [kubepods burstable pod7914ce7c-b79e-4748-bd2f-f99ad718116a] : Timed out while waiting for systemd to remove kubepods-burstable-pod7914ce7c_b79e_4748_bd2f_f99ad718116a.slice"

[root@ning203 ~]# systemctl status kubepods-burstable-pod7914ce7c_b79e_4748_bd2f_f99ad718116a.slice
...
     Loaded: loaded (/run/systemd/transient/kubepods-burstable-pod7914ce7c_b79e_4748_bd2f_f99ad718116a.slice; transient)
...
     CGroup: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7914ce7c_b79e_4748_bd2f_f99ad718116a.slice
             └─cri-containerd-db63ee0795a5850760c67c701d662183ae09b0e1df6a40a54f42717feaeb53bb.scope
               └─ 2191866 runc init

[root@ning203 ~]# cat /proc/2191866/stat
2191866 (runc:[2:INIT]) D 1 2191866 2191866 0 -1 4260160 743 0 0 0 0 1 0 0 20 0 6 0 27417423 1646030848 3261 18446744073709551615 93872309833728 93872314901781 140737326215664 0 0 256 0 0 2143420159 0 0 0 17 7 0 0 0 0 0 93872315794280 93872320913697 93872323584000 140737326223159 140737326223169 140737326223169 140737326223336 0

happyzzz1997 avatar Sep 09 '25 02:09 happyzzz1997

This should be fixed in runc v1.3.1 and newer (v1.3.2, v1.4.0-rc.1). Please let us know if not.

kolyshkin avatar Oct 08 '25 03:10 kolyshkin