k3d
k3d copied to clipboard
[BUG] In CentOS Stream 9, when a container gets OOM-killed, it kills the cluster
Hi all. I'm running some C# applications in a local k3d cluster on my CentOS Stream 9 machine and am noticing some troubling behavior. In short, when a container in my cluster gets OOM-killed, it kills the process, the parent container, the parent pod, all the way up to the K3D node itself, taking out any other pods/containers running on that node and effectively breaking the cluster. This is somewhat mitigated by standing up a multi-node cluster, but the individual nodes will die one by one as this happens until the entire cluster is eventually dead.
I've narrowed this down specifically to when the cgroup driver is systemd
(default on CentOS Stream 9). I do not see this behavior when I switch the cgroup driver to cgroupfs
.
More details below.
What did you do
I took the following steps to repro...
- Create a new k3d cluster
- Run a pod in the cluster
- Simulate an OOM-kill scenario which should kill the container and restart the pod
- Observe that it instead kills the entire node, and in single node clusters, effectively breaks the cluster
The individual commands for the first 3 steps...
# create the cluster
k3d cluster create
# run a pod in the cluster
kubectl run testpod --image=bitnami/dotnet -- tail -f /dev/null
# simulate an OOM-kill scenario by tailing /dev/zero (https://askubuntu.com/a/1188074)
kubectl exec -it testpod -- tail /dev/zero
What happened
The OOM-killer kills the process, the parent container, the parent pod, as well as the parent k3d node, effectively breaking the cluster.
When I try and check the state of my cluster, I see that Docker and k3d report the node as "restarting", and it never recovers...
> docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a0e0a101181e ghcr.io/k3d-io/k3d-proxy:5.4.9 "/bin/sh -c nginx-pr…" 2 minutes ago Up 2 minutes 80/tcp, 0.0.0.0:43481->6443/tcp k3d-k3s-default-serverlb
aa3536b6b0dd rancher/k3s:v1.25.7-k3s1 "/bin/k3d-entrypoint…" 2 minutes ago Restarting (1) 5 seconds ago k3d-k3s-default-server-0
> k3d node list
NAME ROLE CLUSTER STATUS
k3d-k3s-default-server-0 server k3s-default restarting
k3d-k3s-default-serverlb loadbalancer k3s-default running
> k3d cluster list
NAME SERVERS AGENTS LOADBALANCER
k3s-default 1/1 0/0 true
> kubectl get nodes
Unable to connect to the server: EOF
I'm able to confirm with journalctl
that the OOM-killer killed the process and all parent processes up until the k3d node...
> sudo journalctl --since "Mar 17 00:00:00" > with-systemd.log
# parsing that log, here are the relevant bits...
...
Mar 17 22:36:49 ip-172-31-15-98.us-west-1.compute.internal kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/system.slice/docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope/kubepods/besteffort/podd8dcbcc6-5695-4b90-823b-ac11f60809ec/d1d9d0f61ec5d843e253fabee7c19ccd6623b752a2a7cb2de8deda6e7bc067d0,task=tail,pid=20227,uid=0
Mar 17 22:36:49 ip-172-31-15-98.us-west-1.compute.internal kernel: Out of memory: Killed process 20227 (tail) total-vm:799520kB, anon-rss:797192kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1600kB oom_score_adj:1000
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: A process of this unit has been killed by the OOM killer.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 17092 (metrics-server) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 17105 (metrics-server) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 17108 (metrics-server) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 17109 (metrics-server) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 17110 (metrics-server) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 17115 (metrics-server) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 17165 (n/a) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 17166 (metrics-server) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 16667 (pause) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 16896 (coredns) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 16914 (coredns) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 16915 (coredns) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 16918 (coredns) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 16919 (coredns) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 17167 (coredns) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 16685 (pause) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 19736 (pause) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 17744 (pause) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 19896 (pause) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 19974 (bash) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 17848 (pause) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 18239 (traefik) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 18259 (traefik) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 18260 (n/a) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 18261 (traefik) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 18262 (traefik) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 18263 (traefik) with signal SIGKILL.
Mar 17 22:36:50 ip-172-31-15-98.us-west-1.compute.internal systemd[1]: docker-fc22a244e4e9a1704d1b2e9d1a5bdf3e2f66c0de47ce462a109fb474ec7da0aa.scope: Killing process 18264 (n/a) with signal SIGKILL.
... (and many more processes)
Note that in the above logs, the OOM-killer kills the problematic tail
process, but systemd
then proceeds to kill all other processes presumably in the same cgroup hierarchy, which kills all sibling containers, all pods on the k3d node, and the k3d node itself.
What I expected to happen
I expected the problematic process to be killed, and even the parent container and pod, but not the entire node. More importantly, I expected to still have a functional k3d cluster.
I was able to narrow this down specifically to when the cgroup driver is systemd
. I do not see this behavior when I switch the cgroup driver to cgroupfs
.
When I change the cgroup driver to cgroupfs
(see https://stackoverflow.com/a/65870152), I am able to observe nominal behavior again. Executing the same repro steps as above, this is what I observe instead...
> docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ee5c17adec35 ghcr.io/k3d-io/k3d-proxy:5.4.9 "/bin/sh -c nginx-pr…" About a minute ago Up About a minute 80/tcp, 0.0.0.0:45585->6443/tcp k3d-k3s-default-serverlb
46af8629abf5 rancher/k3s:v1.25.7-k3s1 "/bin/k3d-entrypoint…" About a minute ago Up About a minute k3d-k3s-default-server-0
> k3d node list
NAME ROLE CLUSTER STATUS
k3d-k3s-default-server-0 server k3s-default running
k3d-k3s-default-serverlb loadbalancer k3s-default running
> k3d cluster list
NAME SERVERS AGENTS LOADBALANCER
k3s-default 1/1 0/0 true
> kubectl get nodes
NAME STATUS ROLES AGE VERSION
k3d-k3s-default-server-0 Ready control-plane,master 96s v1.25.7+k3s1
And the journalctl
logs also show nominal behavior...
> sudo journalctl --since "Mar 17 00:00:00" > with-cgroupfs.log
# once again parsing that log, here are the relevant bits...
...
Mar 17 22:43:49 ip-172-31-15-98.us-west-1.compute.internal kernel: Out of memory: Killed process 30435 (tail) total-vm:901596kB, anon-rss:899060kB, file-rss:4kB, shmem-rss:0kB, UID:0 pgtables:1800kB oom_score_adj:1000
Note that in the above logs, no other processes are killed other than the problematic tail
process.
Screenshots of terminal output
I have been able to replicate this by spinning up a vanilla CentOS Stream 9 image in EC2 using the Amazon CentOS Stream 9 AMI. Here are my results from the 2 different configurations, with the repro and status commands highlighted...
NOTE: Can change cgroup drivers by following instructions here - https://stackoverflow.com/a/65870152
With systemd
as the cgroup driver
With cgroupfs
as the cgroup driver
Which OS & Architecture
> k3d runtime-info
arch: x86_64
cgroupdriver: systemd
cgroupversion: "2"
endpoint: /var/run/docker.sock
filesystem: xfs
infoname: ip-172-31-15-98.us-west-1.compute.internal
name: docker
os: CentOS Stream 9
ostype: linux
version: 23.0.1
> cat /etc/os-release
NAME="CentOS Stream"
VERSION="9"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="9"
PLATFORM_ID="platform:el9"
PRETTY_NAME="CentOS Stream 9"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:centos:centos:9"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"
Which version of k3d
> k3d version
k3d version v5.4.9
k3s version v1.25.7-k3s1 (default)
Which version of docker
> docker version
Client: Docker Engine - Community
Version: 23.0.1
API version: 1.42
Go version: go1.19.5
Git commit: a5ee5b1
Built: Thu Feb 9 19:49:35 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 23.0.1
API version: 1.42 (minimum version 1.12)
Go version: go1.19.5
Git commit: bc3805a
Built: Thu Feb 9 19:46:32 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.18
GitCommit: 2456e983eb9e37e47538f59ea18f2043c9a73640
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0
> docker info
Client:
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.10.2
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.16.0
Path: /usr/libexec/docker/cli-plugins/docker-compose
scan: Docker Scan (Docker Inc.)
Version: v0.23.0
Path: /usr/libexec/docker/cli-plugins/docker-scan
Server:
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 3
Server Version: 23.0.1
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 2456e983eb9e37e47538f59ea18f2043c9a73640
runc version: v1.1.4-0-g5fd4c4d
init version: de40ad0
Security Options:
seccomp
Profile: builtin
cgroupns
Kernel Version: 5.14.0-229.el9.x86_64
Operating System: CentOS Stream 9
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.699GiB
Name: ip-172-31-15-98.us-west-1.compute.internal
ID: 71b23c0e-f99d-4a0b-8fbc-548459dd9832
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false