Cannot spawn container - no loop devices available
What are the steps to reproduce this issue?
- Create a kubernetes deployment
- Pods fail to be deployed
What happens?
Pods fail to be created with:
Error: could not create container: could not spawn container: could not create oci bundle: could not create SIF bundle: failed to find loop device: failed to attach image /var/lib/singularity/c894da36f0c207a33e862b9a38b3a66d7e02857aa959493df3ff455830f305f8: no loop devices available
What were you expecting to happen?
Pod to be created.
Any logs, error output, comments, etc?
There are plenty of loop devices:
[k8swrk3]/tmp/singularity-cri% ls -l /dev | grep -i loop | wc -l
1612
I've tried this with setuid both set and unset on the singularity binary to no joy, as I saw reports that can cause issues.
sycri logs and kube manifest appended at the bottom.
I can also run containers happily on the workers directly:
[robinsla@k8swrk3]/tmp/singularity-cri% singularity run shub://GodloveD/lolcow
________________________________________
/ Tomorrow will be cancelled due to lack \
\ of interest. /
----------------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
Environment?
OS distribution and version: RHEL 7 (3.10.0-1062.12.1.el7.x86_64)
go version: go1.13.3
Singularity-CRI version: 1.0.0-beta.7
Singularity version: 3.5.2-1.1
Kubernetes version: v1.18.0
Manifest
1 apiVersion: apps/v1
2 kind: Deployment
3 metadata:
4 labels:
5 run: hello-world
6 name: hello-world
7 spec:
8 selector:
9 matchLabels:
10 run: hello-world
11 replicas: 4
12 template:
13 metadata:
14 labels:
15 run: hello-world
16 spec:
17 hostNetwork: true
18 containers:
19 - name: nginx
20 image: nginx:1.7.9
21 ports:
22 - containerPort: 80
23 name: web
24 protocol: TCP
SYCRI logs
Apr 14 07:08:39 sycri[30593]: E0414 07:08:39.924688 30593 main.go:276] /runtime.v1alpha2.RuntimeService/CreateContainer
Apr 14 07:08:39 sycri[30593]: Request: {"pod_sandbox_id":"a03bcbd94bbde213829948d35dd4a58a8d94752c48805185e2a87993d20d6d38","config":{"metadata":{"name":"nginx"},"image":{"image":"a76b355b668c43aca9432a3e8e15b2f17878966fbebadebcb7d45df68b314dd3"},"envs":[{"key":"KUBERNETES_SERVICE_HOST","value":"192.168.240.1"},{"key":"KUBERNETES_SERVICE_PORT","value":"443"},{"key":"KUBERNETES_SERVICE_PORT_HTTPS","value":"443"},{"key":"KUBERNETES_PORT","value":"tcp://192.168.240.1:443"},{"key":"KUBERNETES_PORT_443_TCP","value":"tcp://192.168.240.1:443"},{"key":"KUBERNETES_PORT_443_TCP_PROTO","value":"tcp"},{"key":"KUBERNETES_PORT_443_TCP_PORT","value":"443"},{"key":"KUBERNETES_PORT_443_TCP_ADDR","value":"192.168.240.1"}],"mounts":[{"container_path":"/var/run/secrets/kubernetes.io/serviceaccount","host_path":"/var/lib/kubelet/pods/8bff18c4-3e7b-4a0d-b732-883a3cef54b9/volumes/kubernetes.io~secret/default-token-85h4q","readonly":true,"selinux_relabel":true},{"container_path":"/etc/hosts","host_path":"/var/lib/kubelet/pods/8bff18c4-3e7b-4a0d-b732-883a3cef54b9/etc-hosts","selinux_relabel":true},{"container_path":"/dev/termination-log","host_path":"/var/lib/kubelet/pods/8bff18c4-3e7b-4a0d-b732-883a3cef54b9/containers/nginx/31a5bd44","selinux_relabel":true}],"labels":{"io.kubernetes.container.name":"nginx","io.kubernetes.pod.name":"hello-world-686ff49dc9-pv2rr","io.kubernetes.pod.namespace":"default","io.kubernetes.pod.uid":"8bff18c4-3e7b-4a0d-b732-883a3cef54b9"},"annotations":{"io.kubernetes.container.hash":"cf08d707","io.kubernetes.container.ports":"[{\"name\":\"web\",\"hostPort\":80,\"containerPort\":80,\"protocol\":\"TCP\"}]","io.kubernetes.container.restartCount":"0","io.kubernetes.container.terminationMessagePath":"/dev/termination-log","io.kubernetes.container.terminationMessagePolicy":"File","io.kubernetes.pod.terminationGracePeriod":"30"},"log_path":"nginx/0.log","linux":{"resources":{"cpu_period":100000,"cpu_shares":2,"oom_score_adj":1000},"security_context":{"namespace_options":{"network":2,"pid":1},"run_as_user":{},"seccomp_profile_path":"unconfined","masked_paths":["/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/proc/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"readonly_paths":["/proc/asound","/proc/bus","/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}}},"sandbox_config":{"metadata":{"name":"hello-world-686ff49dc9-pv2rr","uid":"8bff18c4-3e7b-4a0d-b732-883a3cef54b9","namespace":"default"},"log_directory":"/var/log/pods/default_hello-world-686ff49dc9-pv2rr_8bff18c4-3e7b-4a0d-b732-883a3cef54b9","port_mappings":[{"container_port":80,"host_port":80}],"labels":{"io.kubernetes.pod.name":"hello-world-686ff49dc9-pv2rr","io.kubernetes.pod.namespace":"default","io.kubernetes.pod.uid":"8bff18c4-3e7b-4a0d-b732-883a3cef54b9","pod-template-hash":"686ff49dc9","run":"hello-world"},"annotations":{"kubernetes.io/config.seen":"2020-04-14T07:00:40.341542095-04:00","kubernetes.io/config.source":"api"},"linux":{"cgroup_parent":"/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod8bff18c4_3e7b_4a0d_b732_883a3cef54b9.slice","security_context":{"namespace_options":{"network":2,"pid":1}}}}}
Apr 14 07:08:39 sycri[30593]: Response: null
Apr 14 07:08:39 sycri[30593]: Error: rpc error: code = Internal desc = could not create container: could not spawn container: could not create oci bundle: could not create SIF bundle: failed to find loop device: failed to attach image /var/lib/singularity/a76b355b668c43aca9432a3e8e15b2f17878966fbebadebcb7d45df68b314dd3: no loop devices available
I want to plus one this issue. We also see it on Kubernetes 1.17.
We have found a reason why this happens.
When containers are stuck in a crashloopbackoff state, the Singularity CRI seems to exhaust the pool of loopback devices faster than it can clean them up.
You can confirm this by comparing the number of attached devices to the number of loopback devices in your /dev filesystem:
losetup --list | wc -l
ls /dev/loop[0-9]* | wc -l
If they match, all of your available loopbacks are used.
I suppose you could try to increase the number of loopbacks, but really it should be investigated why Singularity is unable to clean up the old, unused ones.