compose
compose copied to clipboard
docker compose build on 150 images that are extends from other files fails randomly
Description
I have a compose.yml with 150 entries like this:
services:
some-server-1:
extends:
file: images/some-server/compose.yml
service: some-server-1
ports:
- '9005:9000'
where images/some-server/compose.yml defines multiple versions like this:
some-server-1:
image: org/some-server:1
build:
context: .
args:
VERSION: 1
and so on.
Trying to build this with docker compose build
fails, often immediately, sometimes after a couple of steps, with:
[+] Building 0.0s (0/0)
failed to receive status: rpc error: code = Unknown desc = no such job b8zjod5gb6j3kcuk87ors39yu
If I split the workload by just building some of the images at a time, it works fine (the limit seems to be rather high, 130 or so).
Output of docker compose version
:
$ docker compose version
Docker Compose version v2.10.2
Output of docker info
:
marcus@ubuntu:~$ docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.9.1-docker)
compose: Docker Compose (Docker Inc., v2.10.2)
scan: Docker Scan (Docker Inc., v0.17.0)
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 20.10.18
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
runc version: v1.1.4-0-g5fd4c4d
init version: de40ad0
Security Options:
app-armor
seccomp
Profile: default
Kernel Version: 5.4.0-125-generic
Operating System: Ubuntu 20.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 15.63GiB
Name: ubuntu
ID: VTKK:OWDF:WM3A:MIO3:ZQV6:S4KS:GEKG:FSZB:FGYF:3L7Z:KYW4:2E72
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
I can confirm this problem.
My current workaround is replacing docker compose build
with:
docker compose config --services | xargs --max-args=5 --max-procs=1 docker compose build
My system:
$ docker info
Client:
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc., v0.8.2-docker)
compose: Docker Compose (Docker Inc., 2.10.2)
Server:
Containers: 11
Running: 2
Paused: 0
Stopped: 9
Images: 413
Server Version: 20.10.17
Storage Driver: devicemapper
Pool Name: docker-254:2-2756959-pool
Pool Blocksize: 65.54kB
Base Device Size: 10.74GB
Backing Filesystem: xfs
Udev Sync Supported: true
Data file: /dev/loop0
Metadata file: /dev/loop1
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Data Space Used: 35.88GB
Data Space Total: 107.4GB
Data Space Available: 41.05GB
Metadata Space Used: 114.5MB
Metadata Space Total: 2.147GB
Metadata Space Available: 2.033GB
Thin Pool Minimum Free Space: 10.74GB
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.185 (2022-05-18)
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6.m
runc version:
init version: de40ad0
Security Options:
seccomp
Profile: default
cgroupns
Kernel Version: 5.19.7-arch1-1
Operating System: Arch Linux
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 15.58GiB
Name: jan-desktop
ID: PZ2L:OHN4:MIL5:UWNU:5A6Q:T4Q5:NR2T:XYRW:QA3S:MVSC:5F6T:ZYL3
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: the devicemapper storage-driver is deprecated, and will be removed in a future release.
WARNING: devicemapper: usage of loopback devices is strongly discouraged for production use.
Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
This seems to be buildit related or buildkit specific.
This seems to be buildit related or buildkit specific.
This looks similar to the issue described here where we run into a timeout because buildkit does not register the job in time. That would explain why limiting parallelism works around the problem (i.e. because it reduced the slowdown so the operation does not time out).
https://github.com/moby/buildkit/issues/2088#issuecomment-920837407 is fixed, so I'm closing this issue as well
As a side note, you can limit the number of concurrent builds using docker compose --parallel=X build