conmon-rs icon indicating copy to clipboard operation
conmon-rs copied to clipboard

Pod cannot be deleted due to missing container startup command

Open Bevisy opened this issue 1 year ago • 3 comments

What happened?

using pod-config.json and container-config.json to create pod:

# cat pod-config.json
{
    "metadata": {
        "name": "nginx-sandbox",
        "namespace": "default",
        "attempt": 1,
        "uid": "hdishd83djaidwnduwk28bcsb"
    },
    "log_directory": "/tmp",
    "linux": {
    }
}

# cat container-config-nginx.json
{
  "metadata": {
      "name": "nginx-0"
  },
  "image":{
      "image": "docker.io/library/nginx:latest"
  },
  "command": [
      "top"
  ],
  "linux": {
  }
}

Then, we could find the container was created failed:

# crictl run container-config-nginx.json pod-config.json
FATA[0012] running container: creating container failed: rpc error: code = Unknown desc = create container: create result: internal/proto/conmon.capnp:Conmon.createContainer: Failed: child command exited with: 1: executable file `top` not found in $PATH: No such file or directory

At this point, the container process on the node becomes a zombie process, and the pod cannot be deleted.

      1   15487   15486    2552 pts/1      11037 Sl       0   0:00 /usr/bin/crio-conmonrs --runtime /usr/bin/crio-crun --runtime-dir /var/lib/containers/storage/overlay-containers/7d46c4f2908be02f02465923ca1aca87295e8872231dae236287fe69209fdec9/userdata --runtime-root /run/crun --log-level info --log-driver systemd --cgroup-manager systemd
  15487   15496   15496   15496 ?             -1 Ss       0   0:00  \_ /pause
  15487   15509   15486    2552 pts/1      11037 Z        0   0:00  \_ [3] <defunct>

What did you expect to happen?

Expect the container process to exit normally instead of becoming a zombie process.

How can we reproduce it (as minimally and precisely as possible)?

See what happened.

Anything else we need to know?

No response

CRI-O and Kubernetes version

conmonrs version: v0.6.3

OS version

# On Linux:
$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
$ uname -a
Linux lima-crio 6.1.0-21-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64 GNU/Linux

Additional environment details (AWS, VirtualBox, physical, etc.)

nothing else

Bevisy avatar Jun 10 '24 11:06 Bevisy

refer: https://github.com/cri-o/cri-o/issues/8272#issuecomment-2158040886

Bevisy avatar Jun 10 '24 11:06 Bevisy

While investigating this issue, I also discovered that when I switch from crun to runc, zombie processes are not generated, and this issue does not occur. Related issue: https://github.com/containers/crun/issues/1482

Bevisy avatar Jun 11 '24 06:06 Bevisy

@Bevisy ah good point, I can reproduce the same with crun but not runc. I was going to find a way to fix it in conmon-rs but found no real solution yet. Maybe it has to be fixed in crun then :thinking:

saschagrunert avatar Jun 11 '24 12:06 saschagrunert