Flatcar
Flatcar copied to clipboard
Upgrade from Stable 2605.8.0 to 2605.9.0 breaks Kubernetes running with docker container runtime
The current stable release 2605.9.0 has broken all our clusters (AWS / GCP / Bare metal + dev / prod). The issue we observe is that some nodes become NotReady
after a while and traffic going to containers on those nodes times out.
Moreover it seems that runc
processes are stuck in D
state (uninterruptible sleep) and a remedy to make the nodes responsive again is to kill those processes.
We are still in the midst of debugging trying to find the root cause, but in the meantime we thought we would raise an Issue detailing as much as we know, and update it as we go. This might also be useful if anyone is experiencing the same problem but haven't managed to formalize the Issue yet (like us).
Impact
Kubernetes cluster nodes enter a NotReady
state and containers cannot receive traffic so after a while the cluster eventually breaks down.
Environment and steps to reproduce
- Run a kubernetes cluster on
Flatcar Container Linux by Kinvolk 2605.9.0 (Oklo)
hosts with kubernetes versionv1.19.2
. - Wait until containerd gets stuck and nodes break.
- We haven't tested if an aggressive rotation of pods can bring this to the surface earlier.
Expected behavior Everything should run as usual after the upgrade
Additional information
Logs from kubelet: At some point we notice that kubelet starts complaining about containerd runtime not responding:
Dec 10 08:49:34 ip-10-66-23-28 kubelet[1295931]: E1210 08:49:34.050798 1295931 kubelet.go:1765] skipping pod synchronization - [container runtime status check may not have completed yet, PLEG is not healhy: pleg was last seen active 3m1.826056813s ago; threshold is 3m0s]
Also we can see a lot of timeouts:
Dec 09 20:28:36 ip-10-66-23-47 kubelet[747034]: E1209 20:28:36.860894 747034 remote_runtime.go:357] UpdateContainerResources "fc09fafd0c40a9a31198f33fd63569be7e759f97426f056836fff15347ef4304" from runtime service failed: rpc error: code = DeadlineExceeded desc = context
deadline exceeded
Dec 09 20:28:36 ip-10-66-23-47 kubelet[747034]: E1209 20:28:36.860949 747034 cpu_manager.go:418] [cpumanager] reconcileState: failed to update container (pod: calico-node-dmrmn, container: calico-node, container id: fc09fafd0c40a9a31198f33fd63569be7e759f97426f056836fff15
347ef4304, cpuset: "0-1", error: rpc error: code = DeadlineExceeded desc = context deadline exceeded)
Dec 09 20:28:37 ip-10-66-23-47 kubelet[747034]: E1209 20:28:37.816557 747034 remote_runtime.go:389] ExecSync fc09fafd0c40a9a31198f33fd63569be7e759f97426f056836fff15347ef4304 '/bin/calico-node -felix-live -bird-live' from runtime service failed: rpc error: code = Deadline
Exceeded desc = context deadline exceeded
Dec 09 20:28:39 ip-10-66-23-47 kubelet[747034]: E1209 20:28:39.937142 747034 remote_runtime.go:389] ExecSync fc09fafd0c40a9a31198f33fd63569be7e759f97426f056836fff15347ef4304 '/bin/calico-node -felix-ready -bird-ready' from runtime service failed: rpc error: code = Unknow
n desc = operation timeout: context deadline exceeded
Dec 09 20:28:47 ip-10-66-23-47 kubelet[747034]: W1209 20:28:47.221936 747034 cpu_manager.go:397] [cpumanager] reconcileState: ignoring terminated container (pod: graylog-5f7f9ff465-bfdsx, container id: fb82db45342e3b4ba8ae0e9f421d63fc06d2ffba2f897fcdf6e1f0d37854ce0f)
Dec 09 20:28:48 ip-10-66-23-47 kubelet[747034]: W1209 20:28:48.009743 747034 cpu_manager.go:397] [cpumanager] reconcileState: ignoring terminated container (pod: vault-1, container id: bd910527ffde5f1bb6cd1c212653d6af6ffdf39b8185b7df3e93fb22d045f310)
Dec 09 20:28:48 ip-10-66-23-47 kubelet[747034]: W1209 20:28:48.261703 747034 cpu_manager.go:397] [cpumanager] reconcileState: ignoring terminated container (pod: calico-node-dmrmn, container id: bba7f01e56fc3f557ddef9217aadfff5e20d4e5239b98cbb2073a5e739044c97)
Dec 09 20:30:37 ip-10-66-23-47 kubelet[747034]: E1209 20:30:37.817316 747034 remote_runtime.go:389] ExecSync fc09fafd0c40a9a31198f33fd63569be7e759f97426f056836fff15347ef4304 '/bin/calico-node -felix-live -bird-live' from runtime service failed: rpc error: code = Unknown desc = operation timeout: context deadline exceeded
Dec 09 20:30:39 ip-10-66-23-47 kubelet[747034]: E1209 20:30:39.938398 747034 remote_runtime.go:389] ExecSync fc09fafd0c40a9a31198f33fd63569be7e759f97426f056836fff15347ef4304 '/bin/calico-node -felix-ready -bird-ready' from runtime service failed: rpc error: code = Unknown desc = operation timeout: context deadline exceeded
Dec 09 20:30:48 ip-10-66-23-47 kubelet[747034]: E1209 20:30:48.261900 747034 remote_runtime.go:357] UpdateContainerResources "fc09fafd0c40a9a31198f33fd63569be7e759f97426f056836fff15347ef4304" from runtime service failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Looking at processes for a hang container we can see:
root 10263 0.0 0.0 4368 708 ? Ds Dec09 0:00 | \_ /usr/local/bin/runsvdir -P /etc/service/enabled
root 10878 0.0 0.0 4216 808 ? Ds Dec09 0:00 | | \_ runsv felix
root 10884 0.7 0.8 1578988 63452 ? Sl Dec09 8:12 | | | \_ calico-node -felix
root 10879 0.0 0.0 4216 836 ? Ds Dec09 0:00 | | \_ runsv bird
root 11318 0.0 0.0 884 656 ? D Dec09 0:08 | | | \_ bird -R -s /var/run/calico/bird.ctl -d -c /etc/calico/confd/config/bird.cfg
root 10880 0.0 0.0 4216 820 ? Ds Dec09 0:00 | | \_ runsv monitor-addresses
root 10885 0.0 0.6 1136340 53668 ? Sl Dec09 0:03 | | | \_ calico-node -monitor-addresses
root 10881 0.0 0.0 4216 740 ? Ds Dec09 0:00 | | \_ runsv confd
root 10886 0.0 0.7 1505000 56496 ? Sl Dec09 0:14 | | | \_ calico-node -confd
root 10882 0.0 0.0 4216 832 ? Ds Dec09 0:00 | | \_ runsv bird6
root 11319 0.0 0.0 744 8 ? D Dec09 0:05 | | | \_ bird6 -R -s /var/run/calico/bird6.ctl -d -c /etc/calico/confd/config/bird6.cfg
root 10883 0.0 0.0 4216 852 ? Ds Dec09 0:00 | | \_ runsv allocate-tunnel-addrs
root 10888 0.0 0.5 1210072 42616 ? Sl Dec09 0:02 | | \_ calico-node -allocate-tunnel-addrs
root 1111612 0.0 0.2 1232712 22308 ? Sl Dec09 0:00 | \_ runc --root /var/run/docker/runtime-runc/moby --log /run/docker/libcontainerd/containerd/io.containerd.runtime.v1.linux/moby/bd138a9c7020cf1ec127d5934d9307a2af6d1fabfc8d79f8a5972d70cc57aa49/log.json --log-format json --systemd-cgroup exec --process /tmp/runc-process535500753 --detach --pid-file /run/docker/libcontainerd/containerd/io.containerd.runtime.v1.linux/moby/bd138a9c7020cf1ec127d5934d9307a2af6d1fabfc8d79f8a5972d70cc57aa49/ff1812165d7c7e84fc1e4dce4e86358c4f244430fea1d0c8445a1123f9e23fcb.pid bd138a9c7020cf1ec127d5934d9307a2af6d1fabfc8d79f8a5972d70cc57aa49
root 1111635 0.0 0.2 1085760 18012 ? Dsl Dec09 0:00 | | \_ runc init
root 1111628 8.7 0.2 1233864 21296 ? Sl Dec09 45:51 | \_ runc --root /var/run/docker/runtime-runc/moby --log /run/docker/libcontainerd/containerd/io.containerd.runtime.v1.linux/moby/bd138a9c7020cf1ec127d5934d9307a2af6d1fabfc8d79f8a5972d70cc57aa49/log.json --log-format json --systemd-cgroup update --resources - bd138a9c7020cf1ec127d5934d9307a2af6d1fabfc8d79f8a5972d70cc57aa49
and killing runc processes from the above seems to trigger the node to function properly again.
Note that we are mainly observing this on calico-node pods (as also visible from kubelet logs above) but we will try to reproduce it with other pods.
Also we found this: https://github.com/moby/moby/issues/40817 that describes a somewhat similar behaviour (even though it is quite old and latest stable flatcar comes with a containerd version > 1.4.0 which is blamed there)
The upgrade from 2605.8.0 to 2605.9.0 upgraded the containerd version from 1.3.7 to 1.4.3. The docker version didn't change much, went from 19.03.12 to 19.03.14.
The containerd issue identified in the linked moby issue seems to be this one: containerd/containerd/issues/4509 (Containers with curl based health checks becoming unresponsive). This issue was introduced in 1.4.0 and fixed with 1.4.1. So, it shouldn't be the same issue. It could be another issue introduced with 1.4.0 that hadn't been found until now, but that begs the question, why it only surfaces with specific Flatcar setups...
The linked containerd issue has a bunch of strategies to get more information regarding what's going on. Check https://github.com/containerd/containerd/issues/4509#issuecomment-689671931 in particular, which includes a quote on how to create goroutine dumps as well as how to set containerd to generate debug output.
@marga-kinvolk tyvm for the advise, I followed that comment and got a goroutine dump of the stuck containerd-shim process here: https://gist.githubusercontent.com/ffilippopoulos/72406746b8969e86cc735a97e364faed/raw/a3343972e7b90142621bbc19df3ce89aff216879/goroutine-stack-dump
I am not sure here but it looks like it's stuck waiting to read from a unix socket. Could this be related to changes regarding CVE-2020-15257: https://github.com/containerd/containerd/security/advisories/GHSA-36xw-fx78-c5r4 ?
A little bit more information from the misbehaving container and runc init
process.
● docker-87a4886cbdaba184ef499c9a4f11debee4ae0b871cb5d1bdd65089db14884b76.scope - libcontainer container 87a4886cbdaba184ef499c9a4f11debee4ae0b871cb5d1bdd65089db14884b76
Loaded: loaded (/run/systemd/transient/docker-87a4886cbdaba184ef499c9a4f11debee4ae0b871cb5d1bdd65089db14884b76.scope; transient)
Transient: yes
Drop-In: /run/systemd/transient/docker-87a4886cbdaba184ef499c9a4f11debee4ae0b871cb5d1bdd65089db14884b76.scope.d
└─50-DevicePolicy.conf, 50-DeviceAllow.conf, 50-CPUShares.conf, 50-CPUQuotaPeriodSec.conf, 50-CPUQuota.conf
Active: active (running) since Mon 2020-12-07 22:17:00 UTC; 3 days ago
Tasks: 47 (limit: 8890)
Memory: 107.6M
CPU: 6min 34.971s
CGroup: /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod1eaa232c_8a32_4893_b751_677bd7c942b6.slice/docker-87a4886cbdaba184ef499c9a4f11debee4ae0b871cb5d1bdd65089db14884b76.scope
├─ 3595 /usr/local/bin/runsvdir -P /etc/service/enabled
├─ 4477 runsv bird
├─ 4478 runsv felix
├─ 4479 runsv allocate-tunnel-addrs
├─ 4480 runsv confd
├─ 4481 runsv monitor-addresses
├─ 4482 runsv bird6
├─ 4485 calico-node -allocate-tunnel-addrs
├─ 4487 calico-node -monitor-addresses
├─ 4897 bird -R -s /var/run/calico/bird.ctl -d -c /etc/calico/confd/config/bird.cfg
├─ 4919 bird6 -R -s /var/run/calico/bird6.ctl -d -c /etc/calico/confd/config/bird6.cfg
├─ 13891 calico-node -confd
├─ 14090 calico-node -felix
└─573668 runc init
Dec 07 22:17:00 worker-k8s-exp-1-d54s.c.uw-dev.internal systemd[1]: Started libcontainer container 87a4886cbdaba184ef499c9a4f11debee4ae0b871cb5d1bdd65089db14884b76.
# ps -fe | grep 57366[8]
root 573668 573651 0 Dec08 ? 00:00:00 runc init
# cat /proc/573668/status | grep State
State: D (disk sleep)
# cat /proc/573668/stack
[<0>] flush_old_exec+0x3e9/0x6e0
[<0>] load_elf_binary+0x32e/0x1640
[<0>] search_binary_handler+0x84/0x1b0
[<0>] __do_execve_file.isra.0+0x4e3/0x830
[<0>] __x64_sys_execve+0x35/0x40
[<0>] do_syscall_64+0x4e/0x120
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
# lsof -p 573668
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
runc:[2:I 573668 root cwd DIR 0,175 4096 2616498 /
runc:[2:I 573668 root rtd DIR 0,175 4096 2616498 /
runc:[2:I 573668 root txt REG 0,1 11169248 5497341 /memfd:runc_cloned:/proc/self/exe (deleted)
runc:[2:I 573668 root mem REG 254,0 1834704 33372 /usr/lib64/libc-2.32.so
runc:[2:I 573668 root mem REG 254,0 132968 32953 /usr/lib64/libseccomp.so.2.5.0
runc:[2:I 573668 root mem REG 254,0 109128 33373 /usr/lib64/libpthread-2.32.so
runc:[2:I 573668 root mem REG 254,0 177624 33357 /usr/lib64/ld-2.32.so
runc:[2:I 573668 root 0r CHR 1,3 0t0 8338 /dev/null
runc:[2:I 573668 root 1w FIFO 0,11 0t0 5497311 pipe
runc:[2:I 573668 root 2w FIFO 0,11 0t0 5497312 pipe
runc:[2:I 573668 root 3u unix 0xffff89accaf79800 0t0 5498207 type=STREAM
runc:[2:I 573668 root 4w FIFO 0,11 0t0 5498209 pipe
runc:[2:I 573668 root 6u a_inode 0,12 0 1029 [eventpoll]
runc:[2:I 573668 root 7r FIFO 0,11 0t0 5497355 pipe
runc:[2:I 573668 root 8w FIFO 0,11 0t0 5497355 pipe
The freezer state seems to be stuck in FREEZING
:
# pwd
/sys/fs/cgroup/freezer/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod1eaa232c_8a32_4893_b751_677bd7c942b6.slice/docker-87a4886cbdaba184ef499c9a4f11debee4ae0b871cb5d1bdd65089db14884b76.scope
# cat freezer.state
FREEZING
Frozen by itself:
# cat freezer.parent_freezing
0
# cat freezer.self_freezing
1
I've found that I can recover the container and bring the node back to a Ready
state by writing THAWED
to the freezer state:
echo THAWED > freezer.state
Some more findings.
Firstly, the issue can be reproduced with a very simple exec livenessProbe:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: testcase
spec:
selector:
matchLabels:
app: testcase
template:
metadata:
labels:
app: testcase
spec:
containers:
- name: testcase
image: alpine:3.12
command:
- sh
- -c
- |
touch /tmp/healthy
while true; do sleep 86400; done
livenessProbe:
exec:
command:
- sh
- -c
- cat /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
I wasn't able to reproduce it with an equivalent docker healthcheck though. At least not within a day of soaking it across a number of nodes.
$ docker run -d --name testcase --health-cmd "cat /tmp/healthy" --health-interval 1s --rm alpine:3.12 sh -c "touch /tmp/healthy; while true; do sleep 86400; done"
Secondly, I tried running 2605.9.0 with binaries from the official docker 19.03.14 release as laid out here: https://kinvolk.io/docs/flatcar-container-linux/latest/container-runtimes/use-a-custom-docker-or-containerd-version/.
So far the runc hang hasn't been reproduced on those nodes.
There are quite a few differences between what ships with Flatcar and what comes bundled with the release.
Flatcar:
Client:
Version: 19.03.14
API version: 1.40
Go version: go1.15.5
Git commit: f8db474
Built: Tue Dec 1 19:10:28 2020
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.14
API version: 1.40 (minimum version 1.12)
Go version: go1.15.5
Git commit: f8db474
Built: Tue Dec 1 19:10:28 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.3
GitCommit: 6806845b4f638417933f68721e289c9aeda456b1
runc:
Version: 1.0.0-rc92+dev.docker-19.03
GitCommit: 3d68c79de7184b0eba97946d4f478736f46bf207
docker-init:
Version: 0.18.0fec3683b971d9c3ef73f284f176672c44b448662
GitCommit:
Docker release:
Client: Docker Engine - Community
Version: 19.03.14
API version: 1.40
Go version: go1.13.15
Git commit: 5eb3275
Built: Tue Dec 1 19:14:24 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.14
API version: 1.40 (minimum version 1.12)
Go version: go1.13.15
Git commit: 5eb3275
Built: Tue Dec 1 19:21:08 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.3.9
GitCommit: ea765aba0d05254012b0b9e595e995c09186427f
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec368
Perhaps notably, the Go version is different: 1.13.5 vs 1.15.5. There are some comments on the containerd issue around avoiding go 1.15: https://github.com/containerd/containerd/issues/4509#issuecomment-687389617. Perhaps this is relevant?
Finally, the issue doesn't reproduce for us when we use containerd as the runtime for Kubernetes as laid out here: https://kinvolk.io/docs/flatcar-container-linux/latest/container-runtimes/switching-from-docker-to-containerd-for-kubernetes/.
As we'll eventually have to switch to containerd anyway, we're going to push forward with that as a workaround.
Another update.
We started getting the same runc hangs on exec probes with containerd after enabling systemd cgroups:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
We were previously enabling the systemd cgroup driver in docker.
After discovering this in containerd, I tried to reproduce the issues with systemd cgroups disabled in docker and I couldn't. So it seems like systemd cgroups are significant here.
@ribbybibby The issue you linked was also about the unconsumed shim debug log, did you disable shim debug or did you use the Flatcar defaults which have it enabled (at least for the releases up to now)?
Thanks for finding out that the cgroup driver makes a difference. The default on Flatcar is cgroupfs
.
Did you set the Kubelet cgroupDriver
setting to the same value as the containerd setting?
@pothos Yep, the kubelet cgroupDriver
setting matched the containerd setting. For a while we were mistakenly running with a mismatch (cgroupDriver : systemd
in kubelet, but not in containerd) and the issue wasn't reproduced in that state.
We've been using the flatcar default shim debug setting.
Ok, so the question is what freezes the cgroup (and doesn't unfreeze it). Just to make sure, this happens with cgroup v1, right? (This is the default under Flatcar unless one forces it to use a hybrid or v2 cgroup setup.) We only test and ship Docker on Flatcar with the cgroupfs driver and cgroup v1, but eventually it would be good to be able to switch and make sure there are no bugs.
Just to make sure, this happens with cgroup v1, right? (This is the default under Flatcar unless one forces it to use a hybrid or v2 cgroup setup.)
Yep, we haven't touched the cgroup version so it must be v1. Not sure how to check that definitively though.
We've seen something similar. We have coredumps disabled for security compliance reason but see this in the kernel logs:
Jan 26 21:57:51 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: Core dump to |/dev/null pipe failed
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: INFO: task dockerd:2425 blocked for more than 120 seconds.
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: Not tainted 4.19.143-flatcar #1
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: dockerd D 0 2425 1 0x00000080
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: Call Trace:
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: __schedule+0x29f/0x840
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: schedule+0x28/0x80
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: wb_wait_for_completion+0x5e/0x90
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: ? wait_woken+0x80/0x80
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: sync_inodes_sb+0xce/0x280
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: sync_filesystem+0x5e/0x90
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: 0xffffffffc041b222
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: sync_filesystem+0x78/0x90
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: generic_shutdown_super+0x22/0x100
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: kill_anon_super+0x14/0x30
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: deactivate_locked_super+0x34/0x70
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: cleanup_mnt+0x3f/0x70
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: task_work_run+0x8a/0xb0
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: exit_to_usermode_loop+0xd3/0xe0
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: do_syscall_64+0xe1/0x100
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: RIP: 0033:0x55d897adf8a0
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: Code: Bad RIP value.
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: RSP: 002b:000000c4240c4f08 EFLAGS: 00000216 ORIG_RAX: 00000000000000a6
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000055d897adf8a0
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: RDX: 0000000000000000 RSI: 0000000000000002 RDI: 000000c42354fdc0
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: RBP: 000000c4240c4f68 R08: 0000000000000000 R09: 0000000000000000
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: R10: 0000000000000000 R11: 0000000000000216 R12: ffffffffffffffff
Jan 26 21:58:09 vmss-agent-worker1-08prodva7-jrlna000H7R kernel: R13: 0000000000000045 R14: 0000000000000044 R15: 0000000000000049
Does it work with cgroupfs
as cgroup driver in containerd/kubelet?
P.S.: Interesting that you are still on a 4.19 kernel ;)
We had a mix of 4.19 and 5.4 running while we were debugging #328 :)
We had --exec-opt native.groupdriver=cgroupfs
added to the dockerd flags, and use the default cgroupfs
in the Kubelet config. I don't think we are setting containerd explicitly as documented here, though: https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd. Although we are using k8s 1.18.14 which still uses dockerd by default IIRC?
Thanks for the response, I wonder if this is really the same because the above report is about self_freezing
being turned on while you saw a hung task in demsg.
I had a suspicion that the downgrade from go 1.15 -> 1.13 for docker binaries in 2605.12.0 might squash this bug but I can still reproduce it.
Edit: I'm using K8s 1.18.18
along Kops 1.18.3
I'm not 100% sure if I'm hitting the same issue but it feels similar.
I upgraded a Kubernetes cluster from 2605.12.0
to 2765.2.2
. I see that Docker gets updated from 19.03.14
to 19.03.15
.
The rollout went fine, but then I found out that (only) Masters nodes start flapping from Ready to NotReady. It happens very fast.
Checking containerd
logs I see a lot of these lines repeated
Apr 28 06:45:50 ip-172-15-62-176.ec2.internal env[1924]: time="2021-04-28T06:45:50.092158422Z" level=info msg="shim reaped" id=0e569ec15c912ba007ef5a1d0a2002ae4a4bcbb7003200353a644b9f71a781c3
Apr 28 06:45:50 ip-172-15-62-176.ec2.internal env[1924]: time="2021-04-28T06:45:50.291288272Z" level=warning msg="runtime v1 is deprecated since containerd v1.4, consider using runtime v2"
Apr 28 06:45:50 ip-172-15-62-176.ec2.internal env[1924]: time="2021-04-28T06:45:50.292082108Z" level=info msg="shim containerd-shim started" address="unix:///run/containerd/s/e94648437d3d307b9139cc384e736f77fec0f4cf63e52f8b2f6ac829931291a6" debug=false pid=3505968
The shim reaped
does not look good 😇 . Maybe it is time to get rid of the shim and Docker altogether? Or should I revert to the previous Flatcar? Thanks
I'm seeing the shim reaped
/ shim started
repeating in my containerd logs too. Our prod API mysteriously stops working until docker is restarted, then it works for around 5-24 hours. We're not even using kubernetes in front of it.