Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "<exec_name>": stat <exec_name>: no such file or directory: unknown
Summary
Microk8s was running fine yesterday, until suddenly every pod either crashedloopbackiff or runcontainer error with similar error log in title:
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "<exec_name>": stat <exec_name>: no such file or directory: unknown,
with exec_name value according to pods application.
One of the pod events is:
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/coredns": stat /coredns: no such file or directory: unknown
containerd-template.toml:
# Use config version 2 to enable new configuration fields.
version = 2
oom_score = 0
[grpc]
uid = 0
gid = 0
max_recv_message_size = 16777216
max_send_message_size = 16777216
[debug]
address = ""
uid = 0
gid = 0
[metrics]
address = "127.0.0.1:1338"
grpc_histogram = false
[cgroup]
path = ""
# The 'plugins."io.containerd.grpc.v1.cri"' table contains all of the server options.
[plugins."io.containerd.grpc.v1.cri"]
stream_server_address = "127.0.0.1"
stream_server_port = "0"
enable_selinux = false
sandbox_image = "registry.k8s.io/pause:3.9"
stats_collect_period = 10
enable_tls_streaming = false
max_container_log_line_size = 16384
# 'plugins."io.containerd.grpc.v1.cri".containerd' contains config related to containerd
[plugins."io.containerd.grpc.v1.cri".containerd]
# snapshotter is the snapshotter used by containerd.
snapshotter = "overlayfs"
# no_pivot disables pivot-root (linux only), required when running a container in a RamDisk with runc.
# This only works for runtime type "io.containerd.runtime.v1.linux".
no_pivot = false
# default_runtime_name is the default runtime name to use.
default_runtime_name = "runc"
# 'plugins."io.containerd.grpc.v1.cri".containerd.runtimes' is a map from CRI RuntimeHandler strings, which specify types
# of runtime configurations, to the matching configurations.
# In this example, 'runc' is the RuntimeHandler string to match.
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
# runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
runtime_type = "io.containerd.runc.v1"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-container-runtime]
# runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
runtime_type = "io.containerd.runc.v1"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia-container-runtime.options]
BinaryName = "nvidia-container-runtime"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata.options]
BinaryName = "kata-runtime"
# 'plugins."io.containerd.grpc.v1.cri".cni' contains config related to cni
[plugins."io.containerd.grpc.v1.cri".cni]
# bin_dir is the directory in which the binaries for the plugin is kept.
bin_dir = "/var/snap/microk8s/7229/opt/cni/bin"
# conf_dir is the directory in which the admin places a CNI conf.
conf_dir = "/var/snap/microk8s/7229/args/cni-network"
# 'plugins."io.containerd.grpc.v1.cri".registry' contains config related to the registry
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/var/snap/microk8s/7229/args/certs.d"
microk8s v1.31.1 revision 7229 (edit: upgrading to v1.31.2 rev7394 still not solving it) containerd v1.6.28 (client and server) calico v3.25.1
What Should Happen Instead?
Everything works normally
Reproduction Steps
microk8s stop microk8s start
Introspection Report
inspection-report-20241107_162205.tar.gz
Can you suggest a fix?
Are you interested in contributing with a fix?
After chmod 777 on /mnt/<nfs_mount>, some of the pods start to work normally. Idk how this fixed some of them. There is some pod that is still error: Error: failed to create "containerd" task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec:
Hello,
Hm, I'm seeing some other errors in your containerd logs as well, which are interesting:
Nov 07 15:49:04 devnode microk8s.daemon-containerd[43022]: time="2024-11-07T15:49:04.009457956+07:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:kai-beta-dms-worker-75965bbf87-vnpvw,Uid:060e839f-b0db-40ff-afd7-62021477dc61,Namespace:kai-beta-dms,Attempt:14,} failed, error" error="failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: \"/pause\": stat /pause: no such file or directory: unknown"
That binary should be from the pause image. I see registry.k8s.io/pause:3.9" is set as the sandbox image (though, I see references to registry.k8s.io/pause:3.7" in the logs as well). It's a simple image with a simple binary, meant to just exist indefinitely, there shouldn't be missing, or have any dependency. There might be something going on on the host itself. Out of curiosity, what CPU architecture do you have? amd64, arm64, other? Have you run something that would "emulate" other platforms, like qemu or binfmt (https://github.com/tonistiigi/binfmt)?
Can you try spawning a simple container? Try the following (note that pause has no output, but ideally won't have any error; you can CTRL+C afterwards, you should see Shutting down, got signal: Interrupt):
microk8s ctr image pull registry.k8s.io/pause:3.9
microk8s ctr run registry.k8s.io/pause:3.9 foo
What about:
docker run --rm -ti registry.k8s.io/pause:3.9
Architecture is x86_64, using rhel 8.7 (ootpa)
Running microk8s ctr run registry.k8s.io/pause:3.9 foo returns shutting down, got signal: Interrupt.
This cluster previously encounter multiple error and fixed by other people. I'm currently fixing this error, so i don't really know what the other are doing to fix previous error. Also, the /var folder got chmod to 777 on every server restart. I also have tried to chmod -R 777 /var/snap/microk8s/common/var/lib/containerd and /var/snap/microk8s/common/run/ folder, but still same error.
pause is using version 3.7 originally, i changed it to 3.9 to fix some error.
Everything works fine now after some possible fix that might be fixing this:
- Disabling kaspersky endpoint agent
- Upgrading container image (image is fine in 1.25, but probably there's breaking change in >1.29)
- chmod -R 0755 /var
- Upgrading snap to 2.65.1-0.el8
- Refreshing microk8s certs
The error is back with similar message:
Error: failed to create containerd task: failed to create shim task, OCI runtime create failed: ruinc create failed: unable to start container propcess: exec "/opt/mendix/entrypoint": stat /opt/mendix/entrypoint: no such file or directory: unknown
edit: currently this happen due to image not rebuilding after one of the fix specified above. Will post more update after solving this issue.