singularity-cri Cannot spawn container - no loop devices available

What are the steps to reproduce this issue?

Create a kubernetes deployment
Pods fail to be deployed

What happens?

Pods fail to be created with:

Error: could not create container: could not spawn container: could not create oci bundle: could not create SIF bundle: failed to find loop device: failed     to attach image /var/lib/singularity/c894da36f0c207a33e862b9a38b3a66d7e02857aa959493df3ff455830f305f8: no loop devices available

What were you expecting to happen?

Pod to be created.

Any logs, error output, comments, etc?

There are plenty of loop devices:

[k8swrk3]/tmp/singularity-cri% ls -l /dev | grep -i loop | wc -l       
1612

I've tried this with setuid both set and unset on the singularity binary to no joy, as I saw reports that can cause issues.

sycri logs and kube manifest appended at the bottom.

I can also run containers happily on the workers directly:

[robinsla@k8swrk3]/tmp/singularity-cri% singularity run shub://GodloveD/lolcow 
 ________________________________________
/ Tomorrow will be cancelled due to lack \
\ of interest.                           /
 ----------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

Environment?

OS distribution and version: RHEL 7 (3.10.0-1062.12.1.el7.x86_64)

go version: go1.13.3

Singularity-CRI version: 1.0.0-beta.7

Singularity version: 3.5.2-1.1

Kubernetes version: v1.18.0

Manifest

  1 apiVersion: apps/v1
  2 kind: Deployment
  3 metadata:
  4   labels:
  5     run: hello-world
  6   name: hello-world
  7 spec:
  8   selector:
  9     matchLabels:
 10       run: hello-world
 11   replicas: 4 
 12   template:
 13     metadata:
 14       labels:
 15         run: hello-world
 16     spec:
 17       hostNetwork: true
 18       containers:
 19       - name: nginx
 20         image: nginx:1.7.9
 21         ports:
 22         - containerPort: 80
 23           name: web
 24           protocol: TCP

SYCRI logs

Apr 14 07:08:39  sycri[30593]: E0414 07:08:39.924688   30593 main.go:276] /runtime.v1alpha2.RuntimeService/CreateContainer
Apr 14 07:08:39  sycri[30593]: Request: {"pod_sandbox_id":"a03bcbd94bbde213829948d35dd4a58a8d94752c48805185e2a87993d20d6d38","config":{"metadata":{"name":"nginx"},"image":{"image":"a76b355b668c43aca9432a3e8e15b2f17878966fbebadebcb7d45df68b314dd3"},"envs":[{"key":"KUBERNETES_SERVICE_HOST","value":"192.168.240.1"},{"key":"KUBERNETES_SERVICE_PORT","value":"443"},{"key":"KUBERNETES_SERVICE_PORT_HTTPS","value":"443"},{"key":"KUBERNETES_PORT","value":"tcp://192.168.240.1:443"},{"key":"KUBERNETES_PORT_443_TCP","value":"tcp://192.168.240.1:443"},{"key":"KUBERNETES_PORT_443_TCP_PROTO","value":"tcp"},{"key":"KUBERNETES_PORT_443_TCP_PORT","value":"443"},{"key":"KUBERNETES_PORT_443_TCP_ADDR","value":"192.168.240.1"}],"mounts":[{"container_path":"/var/run/secrets/kubernetes.io/serviceaccount","host_path":"/var/lib/kubelet/pods/8bff18c4-3e7b-4a0d-b732-883a3cef54b9/volumes/kubernetes.io~secret/default-token-85h4q","readonly":true,"selinux_relabel":true},{"container_path":"/etc/hosts","host_path":"/var/lib/kubelet/pods/8bff18c4-3e7b-4a0d-b732-883a3cef54b9/etc-hosts","selinux_relabel":true},{"container_path":"/dev/termination-log","host_path":"/var/lib/kubelet/pods/8bff18c4-3e7b-4a0d-b732-883a3cef54b9/containers/nginx/31a5bd44","selinux_relabel":true}],"labels":{"io.kubernetes.container.name":"nginx","io.kubernetes.pod.name":"hello-world-686ff49dc9-pv2rr","io.kubernetes.pod.namespace":"default","io.kubernetes.pod.uid":"8bff18c4-3e7b-4a0d-b732-883a3cef54b9"},"annotations":{"io.kubernetes.container.hash":"cf08d707","io.kubernetes.container.ports":"[{\"name\":\"web\",\"hostPort\":80,\"containerPort\":80,\"protocol\":\"TCP\"}]","io.kubernetes.container.restartCount":"0","io.kubernetes.container.terminationMessagePath":"/dev/termination-log","io.kubernetes.container.terminationMessagePolicy":"File","io.kubernetes.pod.terminationGracePeriod":"30"},"log_path":"nginx/0.log","linux":{"resources":{"cpu_period":100000,"cpu_shares":2,"oom_score_adj":1000},"security_context":{"namespace_options":{"network":2,"pid":1},"run_as_user":{},"seccomp_profile_path":"unconfined","masked_paths":["/proc/acpi","/proc/kcore","/proc/keys","/proc/latency_stats","/proc/timer_list","/proc/timer_stats","/proc/sched_debug","/proc/scsi","/sys/firmware"],"readonly_paths":["/proc/asound","/proc/bus","/proc/fs","/proc/irq","/proc/sys","/proc/sysrq-trigger"]}}},"sandbox_config":{"metadata":{"name":"hello-world-686ff49dc9-pv2rr","uid":"8bff18c4-3e7b-4a0d-b732-883a3cef54b9","namespace":"default"},"log_directory":"/var/log/pods/default_hello-world-686ff49dc9-pv2rr_8bff18c4-3e7b-4a0d-b732-883a3cef54b9","port_mappings":[{"container_port":80,"host_port":80}],"labels":{"io.kubernetes.pod.name":"hello-world-686ff49dc9-pv2rr","io.kubernetes.pod.namespace":"default","io.kubernetes.pod.uid":"8bff18c4-3e7b-4a0d-b732-883a3cef54b9","pod-template-hash":"686ff49dc9","run":"hello-world"},"annotations":{"kubernetes.io/config.seen":"2020-04-14T07:00:40.341542095-04:00","kubernetes.io/config.source":"api"},"linux":{"cgroup_parent":"/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod8bff18c4_3e7b_4a0d_b732_883a3cef54b9.slice","security_context":{"namespace_options":{"network":2,"pid":1}}}}}
Apr 14 07:08:39  sycri[30593]: Response: null
Apr 14 07:08:39  sycri[30593]: Error: rpc error: code = Internal desc = could not create container: could not spawn container: could not create oci bundle: could not create SIF bundle: failed to find loop device: failed to attach image /var/lib/singularity/a76b355b668c43aca9432a3e8e15b2f17878966fbebadebcb7d45df68b314dd3: no loop devices available

Apr 14 '20 11:04 ghost

I want to plus one this issue. We also see it on Kubernetes 1.17.

Oct 26 '20 15:10 LincolnBryant

We have found a reason why this happens.

When containers are stuck in a crashloopbackoff state, the Singularity CRI seems to exhaust the pool of loopback devices faster than it can clean them up.

You can confirm this by comparing the number of attached devices to the number of loopback devices in your /dev filesystem:

losetup --list | wc -l
 ls /dev/loop[0-9]* | wc -l

If they match, all of your available loopbacks are used.

I suppose you could try to increase the number of loopbacks, but really it should be investigated why Singularity is unable to clean up the old, unused ones.

Oct 27 '20 20:10 LincolnBryant