nomad-driver-containerd The same image seems to be pulled in parallel causing disk exhaustion

We have about 100 parameterized job definitions that use the same image config:

config {
        image   = "username/backend:some_tag"

The problem is that disk space is exhausted on Nomad clients and it looks like the reason is that the image is being pulled individually for each job, despite specifying the same exact image with the same tag. When using docker Nomad driver this didn't happen and all jobs made use of a single image that was pulled and extracted once.

I might be wrong on the explanation but this is what I get from multiple (hundreds) of error messages like:

[ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=62ab19a7-4e67-c941-cc39-340394800fa1 task=main error="rpc error: code = Unknown desc = Error in pulling image username/backend:some_tag: failed to prepare extraction snapshot \"extract-138110298-tmpn sha256:bf868a0e662ae83512efeacb6deb2e0f0f1694e693fab8f53c110cb503c00b99\": context deadline exceeded"

I.e. it looks like each allocation has it's own extraction snapshot? Is it possible to configure the driver (or containerd) so that all jobs will share a single image snapshot?

Jul 29 '22 12:07 aartur

I noticed this was just implemented in the podman driver and it looks simple, so maybe it can be reused for containerd: https://github.com/hashicorp/nomad-driver-podman/commit/40db1ef0c5af9f2aff7829449af3d950b8ff59b9?diff=unified

Aug 01 '22 15:08 aartur

@aartur I am not able to reproduce this. I tried to launch 10 jobs with same image golang:latest (Image size is ~1GB)

Before I launched the jobs (~55 GB of disk space)

vagrant@vagrant:~/go/src/github.com/Roblox/nomad-driver-containerd$ df -h
Filesystem                                                     Size  Used Avail Use% Mounted on
udev                                                           967M     0  967M   0% /dev
tmpfs                                                          200M  6.5M  193M   4% /run
/dev/mapper/vagrant--vg-root                                    62G  4.4G   55G   8% /

$ nomad job status

root@vagrant:~/go/src/github.com/Roblox/nomad-driver-containerd/example# nomad status
ID        Type     Priority  Status   Submit Date
golang    service  50        running  2022-08-01T17:57:07Z
golang-1  service  50        running  2022-08-01T17:57:45Z
golang-2  service  50        running  2022-08-01T17:58:06Z
golang-3  service  50        running  2022-08-01T17:58:51Z
golang-4  service  50        running  2022-08-01T17:59:03Z
golang-5  service  50        running  2022-08-01T17:59:09Z
golang-6  service  50        running  2022-08-01T17:59:22Z
golang-7  service  50        pending  2022-08-01T17:59:29Z
golang-8  service  50        pending  2022-08-01T17:59:34Z
golang-9  service  50        pending  2022-08-01T17:59:39Z

NOTE: the pending ones are because the memory is exhausted on my VM and nomad is not able to place those allocations.

After the jobs are running, disk space is still ~55 GB

vagrant@vagrant:~/go/src/github.com/Roblox/nomad-driver-containerd$ df -h
Filesystem                                                     Size  Used Avail Use% Mounted on
udev                                                           967M     0  967M   0% /dev
tmpfs                                                          200M  6.5M  193M   4% /run
/dev/mapper/vagrant--vg-root                                    62G  4.4G   55G   8% /

Also, I checked using nerdctl I only see one image.

root@vagrant:~/go/src/github.com/Roblox/nomad-driver-containerd/example# nerdctl images
REPOSITORY    TAG       IMAGE ID        CREATED           SIZE
golang        latest    19dde56d2309    30 minutes ago    1000.0 MiB

Aug 01 '22 18:08 shishir-a412ed

I'm able to reproduce it by submitting 100 jobs with the following bash script:

#!/bin/bash

for i in $(seq 1 100); do
    cat << EOT > job.nomad
    job "bash_loop_$i" {
      datacenters = ["mydatacenter"]
      type = "service"

      group "main" {
        task "main" {
          driver = "containerd-driver"
          config {
            image = "archlinux"
            command = "/bin/bash"
            args = ["-c", "while [ 1 ]; do sleep 1; done"]
          }
          resources {
            cpu    = 100
            memory = 30
          }
        }
      }
    }
EOT
    echo "Running job $i"
    nomad job run -detach job.nomad
done

(mydatacenter needs to be adjusted). When I observe disk space (by running watch -n1 'df -m /'), I see the disk usage increases by about 25GB. Also I see error messages in logs, e.g.:

containerd[1150]: time="2022-08-08T18:22:32.114740523+02:00" level=error msg="(*service).Write failed" error="rpc error: code = Unavailable desc = ref nomad/1/layer-sha256:e1deda52ffad5c9c8e3b7151625b679af50d6459630f4bf0fbf49e161dba4e88 locked for 15.811395992s (since 2022-08-08 18:22:15.868809674 +0200 CEST m=+478296.835995506): unavailable" expected="sha256:e1deda52ffad5c9c8e3b7151625b679af50d6459630f4bf0fbf49e161dba4e88" ref="layer-sha256:e1deda52ffad5c9c8e3b7151625b679af50d6459630f4bf0fbf49e161dba4e88" total=58926165

Aug 08 '22 16:08 aartur

nomad-driver-containerd nomad-driver-containerd copied to clipboard

The same image seems to be pulled in parallel causing disk exhaustion

nomad-driver-containerd
nomad-driver-containerd copied to clipboard