buildx failed with: ERROR: failed to solve: rpc error: code = Unknown desc = open (arm64)
Contributing guidelines
- [X] I've read the contributing guidelines and wholeheartedly agree
I've found a bug, and:
- [X] The documentation does not mention anything about my problem
- [X] There are no open or closed issues that are related to my problem
Description
I recently started to seem some failures which seem to happen only on the arm64 runner so far (we run amd64 and arm64 in parallel on different runners).
#27 exporting to docker image format
#27 exporting layers
#27 exporting layers 35.7s done
#27 exporting manifest sha256:98a526cdb689c3794974e1c755ceb5231df925e5e1446847e02dc748bddf1bc6 done
#27 exporting config sha256:07187ea785e7d24d2beab5a8fde12c3709fc73a9079a587fb78b7b5902e98e57 done
#27 sending tarball
#27 ...
#28 importing to docker
#28 loading layer abecb16ce073 11.31kB / 11.31kB 0.1s done
#28 ERROR: open /var/lib/docker/overlay2/4d64ef876800e0f0b614fb7dd9698ec08025a8ee71ef0215f512b45cc038b5d8/.tmp-committed1515084540: no such file or directory
#27 exporting to docker image format
#27 sending tarball 6.4s done
#27 ERROR: rpc error: code = Unknown desc = open /var/lib/docker/overlay2/4d64ef876800e0f0b614fb7dd9698ec08025a8ee71ef0215f512b45cc038b5d8/.tmp-committed1515084540: no such file or directory
Example:
Expected behaviour
Pass the build
Actual behaviour
Failing
Repository URL
No response
Workflow run URL
No response
YAML workflow
-
Workflow logs
No response
BuildKit logs
https://github.com/ansible/ansible-dev-tools/actions/runs/11069017162/job/30755940313?pr=382#step:6:3610
Additional info
This runner is using ubuntu 24.04 and docker info reports
docker info
Client: Docker Engine - Community
Version: 27.2.1
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.16.2
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.29.2
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 3
Server Version: 27.2.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
runc version: v1.1.14-0-g2c9f560
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.8.0-1016-aws
Operating System: Ubuntu 24.04.1 LTS
OSType: linux
Architecture: aarch64
CPUs: 2
Total Memory: 3.742GiB
Name: ip-10-0-1-209
ID: d13c7458-d626-491f-b9d9-5f0e2a661a08
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
This runner is using ubuntu 24.04
Seems to be a self-hosted runner from what I see in run logs: https://github.com/ansible/ansible-dev-tools/actions/runs/11069017162/job/30755940313?pr=382#step:1:2
Current runner version: '2.319.1'
Runner name: 'devtools-arm64-runner'
Runner group name: 'Default'
Machine name: 'ip-10-0-1-209'
Testing runner upgrade compatibility
It's quite hard to figure out what's going on as there are nested composite actions on this repo. I also can't see the changes anymore on the related pr https://github.com/ansible/ansible-dev-tools/pull/382.
Can you create a small repro please? Thanks
I will, this error does not always reproduce, seems to be random, with something like 1/4 chances to happen. I know that this might be related to the machine itself but the reality is that the error is quite opaque, not giving any hints on why this might happen or where to look for details. As this is a permanent runner, I can easily get access to the logs. Once the PR gets in, it will be easier to look at the workflow.
not giving any hints on why this might happen or where to look for details. As this is a permanent runner, I can easily get access to the logs.
If you can give us the buildkit container logs that would help: https://docs.docker.com/build/ci/github-actions/configure-builder/#buildkit-container-logs
I had another run with --debug and got the logs in GHA console at https://github.com/ansible/ansible-dev-tools/actions/runs/11071410923/job/30763513199?pr=382#step:9:49 - I guess should should be able to read them? No need to export them in any way, I hope.
#28 importing to docker #28 loading layer abecb16ce073 11.31kB / 11.31kB 0.1s done #28 ERROR: open /var/lib/docker/overlay2/4d64ef876800e0f0b614fb7dd9698ec08025a8ee71ef0215f512b45cc038b5d8/.tmp-committed1123400293: no such file or directory
Hum this might an issue with docker engine on this self-hosted runner as this happens when loading the image to docker store. Can you provide docker daemon logs?
Client: Docker Engine - Community Version: 27.2.1 Context: default Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.16.2 Path: /usr/libexec/docker/cli-plugins/docker-buildx compose: Docker Compose (Docker Inc.) Version: v2.29.2 Path: /usr/libexec/docker/cli-plugins/docker-compose Server: Containers: 1 Running: 1 Paused: 0 Stopped: 0 Images: 3 Server Version: 27.2.1 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 runc Default Runtime: runc Init Binary: docker-init containerd version: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c runc version: v1.1.14-0-g2c9f560 init version: de40ad0 Security Options: apparmor seccomp Profile: builtin cgroupns Kernel Version: 6.8.0-1016-aws Operating System: Ubuntu 24.04.1 LTS OSType: linux Architecture: aarch64 CPUs: 2 Total Memory: 3.742GiB Name: ip-10-0-1-209 ID: d13c7458-d626-491f-b9d9-5f0e2a661a08 Docker Root Dir: /var/lib/docker Debug Mode: false Experimental: false Insecure Registries: 127.0.0.0/8
And looking at docker info https://github.com/ansible/ansible-dev-tools/actions/runs/11071410923/job/30763513199?pr=382#step:6:132, I wonder if this has smth to do with Native Overlay Diff: true. Maybe disk space?