Pod cannot be deleted due to missing container startup command
What happened?
using pod-config.json and container-config.json to create pod:
# cat pod-config.json
{
"metadata": {
"name": "nginx-sandbox",
"namespace": "default",
"attempt": 1,
"uid": "hdishd83djaidwnduwk28bcsb"
},
"log_directory": "/tmp",
"linux": {
}
}
# cat container-config-nginx.json
{
"metadata": {
"name": "nginx-0"
},
"image":{
"image": "docker.io/library/nginx:latest"
},
"command": [
"top"
],
"linux": {
}
}
Then, we could find the container was created failed:
# crictl run container-config-nginx.json pod-config.json
FATA[0012] running container: creating container failed: rpc error: code = Unknown desc = create container: create result: internal/proto/conmon.capnp:Conmon.createContainer: Failed: child command exited with: 1: executable file `top` not found in $PATH: No such file or directory
At this point, the container process on the node becomes a zombie process, and the pod cannot be deleted.
1 15487 15486 2552 pts/1 11037 Sl 0 0:00 /usr/bin/crio-conmonrs --runtime /usr/bin/crio-crun --runtime-dir /var/lib/containers/storage/overlay-containers/7d46c4f2908be02f02465923ca1aca87295e8872231dae236287fe69209fdec9/userdata --runtime-root /run/crun --log-level info --log-driver systemd --cgroup-manager systemd
15487 15496 15496 15496 ? -1 Ss 0 0:00 \_ /pause
15487 15509 15486 2552 pts/1 11037 Z 0 0:00 \_ [3] <defunct>
What did you expect to happen?
Expect the container process to exit normally instead of becoming a zombie process.
How can we reproduce it (as minimally and precisely as possible)?
See what happened.
Anything else we need to know?
No response
CRI-O and Kubernetes version
conmonrs version: v0.6.3
OS version
# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ uname -a
Linux lima-crio 6.1.0-21-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux
Additional environment details (AWS, VirtualBox, physical, etc.)
refer: https://github.com/cri-o/cri-o/issues/8272#issuecomment-2158040886
While investigating this issue, I also discovered that when I switch from crun to runc, zombie processes are not generated, and this issue does not occur. Related issue: https://github.com/containers/crun/issues/1482
@Bevisy ah good point, I can reproduce the same with crun but not runc. I was going to find a way to fix it in conmon-rs but found no real solution yet. Maybe it has to be fixed in crun then :thinking: