compose icon indicating copy to clipboard operation
compose copied to clipboard

docker compose build on 150 images that are extends from other files fails randomly

Open lambdafu opened this issue 1 year ago • 1 comments

Description

I have a compose.yml with 150 entries like this:

services:
  some-server-1:
    extends:
      file: images/some-server/compose.yml
      service: some-server-1
    ports:
      - '9005:9000'

where images/some-server/compose.yml defines multiple versions like this:

  some-server-1:
    image: org/some-server:1
    build:
      context: .
      args:
        VERSION: 1

and so on.

Trying to build this with docker compose build fails, often immediately, sometimes after a couple of steps, with:

[+] Building 0.0s (0/0)
failed to receive status: rpc error: code = Unknown desc = no such job b8zjod5gb6j3kcuk87ors39yu

If I split the workload by just building some of the images at a time, it works fine (the limit seems to be rather high, 130 or so).

Output of docker compose version:

$ docker compose version
Docker Compose version v2.10.2

Output of docker info:

marcus@ubuntu:~$ docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.9.1-docker)
  compose: Docker Compose (Docker Inc., v2.10.2)
  scan: Docker Scan (Docker Inc., v0.17.0)

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 20.10.18
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
 runc version: v1.1.4-0-g5fd4c4d
 init version: de40ad0
 Security Options:
  app-armor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-125-generic
 Operating System: Ubuntu 20.04.5 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 15.63GiB
 Name: ubuntu
 ID: VTKK:OWDF:WM3A:MIO3:ZQV6:S4KS:GEKG:FSZB:FGYF:3L7Z:KYW4:2E72
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

lambdafu avatar Sep 13 '22 18:09 lambdafu

I can confirm this problem.

My current workaround is replacing docker compose build with:

docker compose config --services | xargs --max-args=5 --max-procs=1 docker compose build

My system:

$ docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.8.2-docker)
  compose: Docker Compose (Docker Inc., 2.10.2)

Server:
 Containers: 11
  Running: 2
  Paused: 0
  Stopped: 9
 Images: 413
 Server Version: 20.10.17
 Storage Driver: devicemapper
  Pool Name: docker-254:2-2756959-pool
  Pool Blocksize: 65.54kB
  Base Device Size: 10.74GB
  Backing Filesystem: xfs
  Udev Sync Supported: true
  Data file: /dev/loop0
  Metadata file: /dev/loop1
  Data loop file: /var/lib/docker/devicemapper/devicemapper/data
  Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
  Data Space Used: 35.88GB
  Data Space Total: 107.4GB
  Data Space Available: 41.05GB
  Metadata Space Used: 114.5MB
  Metadata Space Total: 2.147GB
  Metadata Space Available: 2.033GB
  Thin Pool Minimum Free Space: 10.74GB
  Deferred Removal Enabled: true
  Deferred Deletion Enabled: true
  Deferred Deleted Device Count: 0
  Library Version: 1.02.185 (2022-05-18)
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6.m
 runc version:
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.19.7-arch1-1
 Operating System: Arch Linux
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 15.58GiB
 Name: jan-desktop
 ID: PZ2L:OHN4:MIL5:UWNU:5A6Q:T4Q5:NR2T:XYRW:QA3S:MVSC:5F6T:ZYL3
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: the devicemapper storage-driver is deprecated, and will be removed in a future release.
WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use.
         Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.

Holzhaus avatar Sep 14 '22 09:09 Holzhaus

This seems to be buildit related or buildkit specific.

lambdafu avatar Sep 23 '22 12:09 lambdafu

This seems to be buildit related or buildkit specific.

This looks similar to the issue described here where we run into a timeout because buildkit does not register the job in time. That would explain why limiting parallelism works around the problem (i.e. because it reduced the slowdown so the operation does not time out).

Holzhaus avatar Sep 23 '22 13:09 Holzhaus

https://github.com/moby/buildkit/issues/2088#issuecomment-920837407 is fixed, so I'm closing this issue as well

As a side note, you can limit the number of concurrent builds using docker compose --parallel=X build

ndeloof avatar May 03 '23 12:05 ndeloof