Builds stuck in "preparing build cache for export" stage
Contributing guidelines and issue reporting guide
- [x] I've read the contributing guidelines and wholeheartedly agree. I've also read the issue reporting guide.
Well-formed report checklist
- [x] I have found a bug that the documentation does not mention anything about my problem
- [x] I have found a bug that there are no open or closed issues that are related to my problem
- [x] I have provided version/information about my environment and done my best to provide a reproducer
Description of bug
Bug description
I am seeing buildkit failing to leave the "preparing build cache for export" stage when building a large/multi stage image. It does seem to pass from time-to-time (after ~ 1-2 hours), but mostly it seems to be stuck in "checkLoops"/"removeLoops" fn, with CPU pegged to 100%.
docker buildx build invocation specifies "registry" cache with two --cache-from (type=registry) flags (same repo, 2 different tags–if that makes a difference) and a single --cache-to (type=registry)
pprof output:
(pprof) top50 -cum
Showing nodes accounting for 118.43s, 98.06% of 120.77s total
Dropped 139 nodes (cum <= 0.60s)
flat flat% sum% cum cum%
0 0% 0% 119.92s 99.30% github.com/moby/buildkit/cache/remotecache/v1.(*CacheChains).Marshal
0 0% 0% 119.92s 99.30% github.com/moby/buildkit/cache/remotecache/v1.(*CacheChains).normalize
9.64s 7.98% 7.98% 119.92s 99.30% github.com/moby/buildkit/cache/remotecache/v1.(*normalizeState).checkLoops
0 0% 7.98% 119.92s 99.30% github.com/moby/buildkit/cache/remotecache/v1.(*normalizeState).removeLoops
0 0% 7.98% 119.90s 99.28% github.com/moby/buildkit/cache/remotecache.(*contentCacheExporter).Finalize
0 0% 7.98% 119.70s 99.11% github.com/moby/buildkit/solver/llbsolver.runCacheExporters.func1.1
0 0% 7.98% 119.20s 98.70% github.com/moby/buildkit/solver/llbsolver.inBuilderContext.func1
0 0% 7.98% 117.89s 97.62% github.com/moby/buildkit/solver.(*Job).InContext
0 0% 7.98% 116.29s 96.29% github.com/moby/buildkit/solver/llbsolver.inBuilderContext
0 0% 7.98% 114.14s 94.51% github.com/moby/buildkit/solver/llbsolver.runCacheExporters.func1
0 0% 7.98% 111.42s 92.26% golang.org/x/sync/errgroup.(*Group).Go.func1
22.34s 18.50% 26.48% 33.73s 27.93% runtime.mapiternext
6.20s 5.13% 31.61% 32.42s 26.84% runtime.mapiterinit
14.93s 12.36% 43.98% 28.40s 23.52% runtime.mapaccess2_faststr
17.66s 14.62% 58.60% 17.66s 14.62% aeshashbody
0.61s 0.51% 59.10% 14.30s 11.84% github.com/moby/buildkit/cache/remotecache/v1.(*normalizeState).checkLoops.func1
7.89s 6.53% 65.64% 13.69s 11.34% runtime.mapdelete_faststr
5.69s 4.71% 70.35% 11.06s 9.16% runtime.mapassign_faststr
9.67s 8.01% 78.36% 9.67s 8.01% runtime.add (inline)
5.69s 4.71% 83.07% 9.03s 7.48% runtime.mapaccess2_fast64
2.53s 2.09% 85.16% 6.27s 5.19% runtime.(*bmap).overflow (inline)
2.39s 1.98% 87.14% 6.04s 5.00% runtime.rand
3.51s 2.91% 90.05% 3.51s 2.91% runtime.isEmpty (inline)
0.22s 0.18% 90.23% 3.35s 2.77% internal/chacha8rand.(*State).Refill
3.13s 2.59% 92.82% 3.13s 2.59% internal/chacha8rand.block
1.94s 1.61% 94.43% 1.94s 1.61% runtime.memhash64
1.43s 1.18% 95.61% 1.43s 1.18% runtime.strhash
0.88s 0.73% 96.34% 0.88s 0.73% runtime.tophash (inline)
0.71s 0.59% 96.93% 0.71s 0.59% internal/abi.(*Type).Pointers (inline)
0.68s 0.56% 97.49% 0.68s 0.56% runtime.duffzero
0.06s 0.05% 97.54% 0.66s 0.55% runtime.bucketMask (inline)
0.63s 0.52% 98.06% 0.63s 0.52% runtime.bucketShift (inline)
Reproduction
It may be difficult to reproduce and I cannot ship Dockerfile as that is private but it does appear from time to time,–and I believe #2009 is related.
Version information
Running buildkitd in docker-container mode, v0.21.1 (the current moby/buildkit:buildx-stable-1).
~$ docker buildx version && docker buildx inspect
github.com/docker/buildx v0.24.0 d0e5e86
Name: gha-runner-vm-builder
Driver: docker-container
Last Activity: 2025-06-03 14:52:14 +0000 UTC
Nodes:
Name: gha-runner-vm-builder0
Endpoint: unix:///var/run/docker.sock
Driver Options: network="host"
Status: running
BuildKit daemon flags: --allow-insecure-entitlement=network.host
BuildKit version: v0.21.1
Platforms: linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/386
Labels:
org.mobyproject.buildkit.worker.executor: oci
org.mobyproject.buildkit.worker.hostname: mayhem-gha-runner
org.mobyproject.buildkit.worker.network: host
org.mobyproject.buildkit.worker.oci.process-mode: sandbox
org.mobyproject.buildkit.worker.selinux.enabled: false
org.mobyproject.buildkit.worker.snapshotter: overlayfs
File#buildkitd.toml:
> debug = true
>
> [grpc]
> debugAddress = "0.0.0.0:6060"
>
> [worker]
>
> [worker.oci]
> gc = false
>
and
~$ docker version && docker info
Client: Docker Engine - Community
Version: 28.2.2
API version: 1.50
Go version: go1.24.3
Git commit: e6534b4
Built: Fri May 30 12:07:27 2025
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 28.2.2
API version: 1.50 (minimum version 1.24)
Go version: go1.24.3
Git commit: 45873be
Built: Fri May 30 12:07:27 2025
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.7.27
GitCommit: 05044ec0a9a75232cad458027ca83437aae3f4da
runc:
Version: 1.2.5
GitCommit: v1.2.5-0-g59923ef
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Client: Docker Engine - Community
Version: 28.2.2
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.24.0
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.36.2
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 108
Server Version: 28.2.2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
CDI spec directories:
/etc/cdi
/var/run/cdi
Swarm: inactive
Runtimes: runc io.containerd.runc.v2
Default Runtime: runc
Init Binary: docker-init
containerd version: 05044ec0a9a75232cad458027ca83437aae3f4da
runc version: v1.2.5-0-g59923ef
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
Kernel Version: 6.8.0-60-generic
Operating System: Ubuntu 24.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 48
Total Memory: 47.03GiB
Name: clint-vm-1
ID: f696c190-9a0a-4598-b3a1-98f47405a8f0
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
::1/128
127.0.0.0/8
Live Restore Enabled: false
There is a very similar trace reported here: https://github.com/earthly/earthly/issues/1187#issuecomment-992601007
Do you have a reproducer?
Sadly no, the issue is intermittent and goes away after killing the cache which I had to do to unblock production C/D :(
Hi, I'm having the exact same issue. It only happens when I'm trying to push the cache to a registry. This is the --print of my image
"cache-from": [
{
"ref": "XXX.dkr.ecr.eu-west-1.amazonaws.com/<IMAGE>:cache",
"type": "registry"
}
],
"cache-to": [
{
"mode": "max",
"ref": "XXX.dkr.ecr.eu-west-1.amazonaws.com/<IMAGE>:cache",
"type": "registry"
}
],
"output": [
{
"type": "registry"
}
Client: Docker Engine - Community
Version: 28.2.2
API version: 1.50
Go version: go1.24.3
Git commit: e6534b4
Built: Fri May 30 12:07:28 2025
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 28.2.2
API version: 1.50 (minimum version 1.24)
Go version: go1.24.3
Git commit: 45873be
Built: Fri May 30 12:07:28 2025
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.7.27
GitCommit: 05044ec0a9a75232cad458027ca83437aae3f4da
runc:
Version: 1.2.5
GitCommit: v1.2.5-0-g59923ef
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.24.0
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.36.2
Path: /home/user/.docker/cli-plugins/docker-compose
It's a large image with a a few multi-stage steps, the cache is around 3GB. I was able to push it once to the registry, didn't take long, but now it's always at the "preparing build cache for export"
yesterday I left it running for ~5hours and still didn't finish it.
Also, even after I cancel the command, buildkitd process keeps running forever taking ~40% CPU
Hello I believe I am seeing this as well.. I tried updating to 0.23.1 but still seeing the same issue
I'm seeing this issue with raw buildctl and buildkitd in 0.23.1 (both rootless and non-rootless mode) using multiple DockerHub image imports/exports such as:
--export-cache type=registry,ref=org/repo:tag,mode=max,compression=zstd,force-compression=true,compression-level=3,oci-mediatypes=true,ignore-error=true
I don't have specific steps to reproduce yet, but it's happening extremely consistently now.
Killing build cache registry seems to recover it for me only to regress after some time. Seems like some sort of cache data corruption issue related to these data structures where cyclic(?) references are introduced accidentally and then the validation logic gets stuck looping iterating these loops.
@razzmatazz yeah, the only thing that really fixes it for me is totally deleting my jenkins workers and starting from scratch
Rolling back to 0.22.x seems to have fixed this problem for us.
Rolling back to 0.22.x seems to have fixed this problem for us.
Thanks, was plagued all last week by this issue, and the moment we reverted back to 0.22.x (both server & client) everything settled down for us.
We tried various things: purging the cache in the registry, wiping/resizing our local cache volumes, tuning garbage collection policies, ignoring cache errors, etc... nothing really made the problem go away permanently.
If you can reproduce and think this is a regression, can you bisect the issue to understand the breaking point.
https://github.com/moby/buildkit/blob/master/.github/issue_reporting_guide.md#regressions
Rolling back to 0.22.x seems to have fixed this problem for us.
Actually, I was seeing this (intermittently) for about a year or two now so I am not sure if this is a recently regression.
I'm 100% not sure, but it seems I'm experiencing the same issue.
My use case: In a Docker-in-Docker environment, I prepare and publish 100+ custom Kibana Docker images.
I use:
BUILDX_VER="v0.25.0"
BUILDKIT_VER="v0.23.2"
I create a builder like this:
docker buildx create \
--name "$BUILDER_NAME" \
--driver docker-container \
--driver-opt "image=moby/buildkit:$BUILDKIT_VER" \
--buildkitd-flags "--debug --trace" \
--platform linux/amd64,linux/arm64 \
--use --bootstrap
Image is built and pushed like this:
+ docker buildx build --no-cache --platform linux/arm64,linux/amd64 --push --build-arg KBN_VERSION=8.14.1 --build-arg ROR_PLUGIN_PATH=builds/readonlyrest_kbn_universal-1.64.2_es8.14.1.zip -f Dockerfile -t beshultd/kibana-readonlyrest:8.14.1-ror-1.64.2 -t beshultd/kibana-readonlyrest:8.14.1-ror-latest .
#0 building with "ror_kbn_builder_1751882513" instance using docker-container driver
What do I experience? After releasing several images, the build is stuck. It seems it is stuck in the exporting image phase. See attached log.
Previously: With old versions of Buildx and Buildkit, I had this: https://github.com/moby/buildkit/issues/5784#issuecomment-2906722423
Workaround: Every 5 published versions, I remove the builder and its container and create a new one.
Version information:
docker buildx version && docker buildx inspect
github.com/docker/buildx v0.25.0 faaea65da4ba0e58a13cd9cadcb950c51cf3b3c9
Name: ror_kbn_builder_1751884090
Driver: docker-container
Last Activity: 2025-07-07 10:28:14 +0000 UTC
Nodes:
Name: ror_kbn_builder_17518840900
Endpoint: unix:///var/run/docker.sock
Driver Options: image="moby/buildkit:v0.23.2"
Status: running
BuildKit daemon flags: --debug --trace --allow-insecure-entitlement=network.host
BuildKit version: v0.23.2
Platforms: linux/amd64*, linux/arm64*, linux/arm/v7, linux/arm/v6
Features:
Automatically load images to the Docker Engine image store: false
Cache export: true
Direct push: true
Docker exporter: true
Multi-platform build: true
OCI exporter: true
Labels:
org.mobyproject.buildkit.worker.executor: oci
org.mobyproject.buildkit.worker.hostname: 280ecd186224
org.mobyproject.buildkit.worker.network: host
org.mobyproject.buildkit.worker.oci.process-mode: sandbox
org.mobyproject.buildkit.worker.selinux.enabled: false
org.mobyproject.buildkit.worker.snapshotter: overlayfs
GC Policy rule#0:
All: false
Filters: type==source.local,type==exec.cachemount,type==source.git.checkout
Keep Duration: 48h0m0s
Max Used Space: 488.3MiB
GC Policy rule#1:
All: false
Keep Duration: 1440h0m0s
Reserved Space: 7.451GiB
Max Used Space: 55.88GiB
Min Free Space: 13.97GiB
GC Policy rule#2:
All: false
Reserved Space: 7.451GiB
Max Used Space: 55.88GiB
Min Free Space: 13.97GiB
GC Policy rule#3:
All: true
Reserved Space: 7.451GiB
Max Used Space: 55.88GiB
Min Free Space: 13.97GiB
docker version && docker info
Client:
Version: 26.1.4
API version: 1.45
Go version: go1.21.11
Git commit: 5650f9b
Built: Wed Jun 5 11:27:58 2024
OS/Arch: linux/arm64
Context: default
Server: Docker Engine - Community
Engine:
Version: 26.1.4
API version: 1.45 (minimum version 1.24)
Go version: go1.21.11
Git commit: de5c9cf
Built: Wed Jun 5 11:29:18 2024
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: v1.7.18
GitCommit: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
runc:
Version: 1.1.12
GitCommit: v1.1.12-0-g51d5e94
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Client:
Version: 26.1.4
Context: default
Debug Mode: true
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.25.0
Path: /usr/local/lib/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.27.0
Path: /usr/local/lib/docker/cli-plugins/docker-compose
Server:
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 1
Server Version: 26.1.4
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
runc version: v1.1.12-0-g51d5e94
init version: de40ad0
Security Options:
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.1.0-21-arm64
Operating System: Ubuntu 20.04.6 LTS
OSType: linux
Architecture: aarch64
CPUs: 4
Total Memory: 7.567GiB
Name: 8747dd7514fa
ID: 14918813-099e-43ca-bc02-9602a7aa0fbe
Docker Root Dir: /var/lib/docker
Debug Mode: true
File Descriptors: 41
Goroutines: 58
System Time: 2025-07-07T10:29:51.984889697Z
EventsListeners: 0
Username: readonlyrestkbn
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
I couldn't create a reproducible environment outside my project but I kept having this issue when running what I have so created a PR that fixed the issue on my project. #6082 I'm assuming it has something to do with multi-stage builds because it happens more consistently with some of my images, but that's just a guess.
@razzmatazz your profiling data helped narrow it down. @coutoPL and @korbin since you seem to have this happening consistently maybe you can give this a go?
I understand that my PR doesn't completely solve the issue but I have good news! I was able to create a reproducible environment @tonistiigi The first and second time I run the build command it works fine. But the 3rd and 4th it gets stuck on the checkLoops function.
I reproduced this while using ECR as cache registry.
The build command is stuck on
=> [test_sub1] exporting cache to registry
=> => preparing build cache for export
To reproduce
Run this build command 3-4 times: docker compose -f docker-compose.yml build test_service
Use this docker-compose.yml file
services:
test_service:
build:
cache_from:
- XXXXXXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/test-repo:test_service
cache_to:
- type=registry,mode=max,ref=XXXXXXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/test-repo:test_service
x-bake:
output: type=registry
additional_contexts:
test_sub1: service:test_sub1
test_sub2: service:test_sub2
dockerfile_inline: |
FROM alpine as deps
COPY --from=test_sub1 /file1 /file1
COPY --from=test_sub2 /deps1 /deps1
FROM deps as prod-deps
RUN echo "This is a production dependency stage" > /file1
FROM alpine as dev
COPY --from=deps /file1 /file1
COPY --from=deps /deps1 /deps1
FROM alpine as prod
COPY --from=deps /deps1 /deps1
COPY --from=prod-deps /file1 /file1
test_sub1:
build:
cache_from:
- XXXXXXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/test-repo:test_sub1
cache_to:
- type=registry,mode=max,ref=XXXXXXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/test-repo:test_sub1
x-bake:
output: type=registry
additional_contexts:
test_sub2: service:test_sub2
test_innersub1: service:test_innersub1
dockerfile_inline: |
FROM alpine as deps
COPY --from=test_sub2 /file1 /file1
COPY --from=test_innersub1 /deps1 /deps1
FROM alpine
COPY --from=deps /file1 /file1
RUN touch /file2
test_sub2:
build:
cache_from:
- XXXXXXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/test-repo:test_sub2
cache_to:
- type=registry,mode=max,ref=XXXXXXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/test-repo:test_sub2
x-bake:
output: type=registry
dockerfile_inline: |
FROM alpine as deps
RUN touch /deps1
RUN touch /deps2
FROM alpine
COPY --from=deps /deps1 /deps1
RUN touch /file1
test_innersub1:
build:
cache_from:
- XXXXXXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/test-repo:test_innersub1
cache_to:
- type=registry,mode=max,ref=XXXXXXXXXXXX.dkr.ecr.eu-west-1.amazonaws.com/test-repo:test_innersub1
x-bake:
output: type=registry
dockerfile_inline: |
FROM alpine as deps
RUN touch /deps1
RUN touch /deps2
FROM alpine
COPY --from=deps /deps1 /deps1
RUN touch /file1
Ran the repro steps from @israelglar (thank you!), with some modifications to push to my own ECR and such of course.
Info: BuildKit: github.com/moby/buildkit v0.20.0 121ecd5b9083b8eef32183cd404dd13e15b4a3df
It ran fine 6 times, so I shrunk down my cache volume to 2GiB and tried again. The builds still ran fine for another 6 rounds, until I decided to force one of the cross-target files to be large. In the example above, I just changed the RUN touch /file1 instruction for test_sub2 to:
RUN dd if=/dev/urandom of=/file1 bs=1024 count=1G
Which generates a huge file of course, and promptly filled up my buildkit volume.
From there, I reverted my change and re-ran the build - and now it fails every time. The large partial blob was removed from the cache, per the buildkitd logs and watching the volume stats, but every attempt to build now hangs on [test_service] exporting cache to registry.
Watching the debug logs, I get a pretty consistent set of entries about removing content, and then a 'schedule content cleanup' message - then nothing at all until I manually kill the build. Once it's killed, I get the expected messages about sessions finishing, and things look to be back to normal. I let it go for 15 minutes, and got the same results of 'a void' in the logs.
An unrelated simple build with ECR caching enabled worked fine repeatedly, so it certainly appears to be something about the that specific blob being cached from that specific image that has things hung up.
As a quick check, I added a couple of simple RUN echo "hi there" instructions around the touch /file1 bit that I had tweaked prior, and it still hangs in the same place. I decided to dig into the actual blob contents to see if the allegedly removed blobs were still around, and in looking up the digests for them I realized the same 4 blobs are 'removed' in the logs on each build, but they do still exist in the cache it seems.
From there I opted to just do a buildctl prune-histories, and while most of the builds were purged from the history it did throw a handful of errors:
error: 9 errors occurred:
* lease "ref_u8c4dw5qbl4gil9juyt7a3kzc": not found
* lease "ref_vii9iacmuzi8yvqdtaug206rg": not found
* lease "ref_2urka2vkq026jynr24f6u7soq": not found
[ +6 more ]
Not sure I can find time to do a better deep dive, but it's certainly a reliable reproduction method.
[Edit] Forgot my config, only interesting bit is the GC setup:
debug = true
root = "/var/lib/buildkit"
[log]
format = "json"
[history]
maxEntries = 500
[worker.oci]
enabled = true
gc = true
maxUsedSpace = "90%"
[worker.containerd]
enabled = false
Huh, interestingly I was able to restart buildkitd and it freed up the previously-error'ed build histories, and then I was able to prune the full cache... but a 'fresh' attempt at the docker compose build still hung up, and then three of the 4 build histories it created can't be deleted (same 'lease not found' error). Will do some more digging later, but thought I'd add that bit at least.
Great news! @tonistiigi either found a solution or is in a great path. His PR #6129 fixes the issue on my end. Maybe you can take a look at it @cboggs ?
These are the results I've got:
==== Runs with current cache storage ===
Run #1 Duration: 12 seconds (push cache)
Run #2 Duration: 4 seconds (only pull)
Run #3 Duration: 4 seconds (only pull)
Run #4 Duration: 3 seconds (only pull)
Run #5 Duration: 5 seconds (only pull)
Run #6 Duration: 60 seconds + canceled (stuck)
==== Runs with new cache storage ====
Run #1 Duration: 9 seconds (push cache)
Run #2 Duration: 4 seconds (only pull)
Run #3 Duration: 3 seconds (only pull)
Run #4 Duration: 4 seconds (only pull)
Run #5 Duration: 3 seconds (only pull)
Run #6 Duration: 3 seconds (only pull)
Run #7 Duration: 3 seconds (only pull)
Run #8 Duration: 3 seconds (only pull)
Run #9 Duration: 3 seconds (only pull)
Run #10 Duration: 3 seconds (only pull)
@israelglar Sweet! I'll give it a shot. Took me a bit to get it to repro on my existing setup for whatever reason, but I'll get the new build cranked out and running then follow-up!
Woohoo! @israelglar @tonistiigi Confirmed that the branch for #6129 does the trick.
- Reproduced the issue on 0.20.0 just be sure I wasn't fudging the results.
- Built binaries from Tõnis' branch via
docker buildx bake binariesand copied them to my test server. - Stopped buildkitd on the server,
rm -rfbuildkit root dir, moved new binaries into/usr/local/bin, restarted buildkitd.
> buildctl --addr tcp://localhost:9800 debug info
BuildKit: github.com/moby/buildkit v0.24.0-rc2-18-gaed2e4a19 aed2e4a1929f330b42a9557fd152510f1668f390
- Re-ran repro steps from before, several times, and observed:
a. build fails when disk fills up, as it should
b. buildctl commands are still usable while disk is full
c. Can manually
buildctl prunesuccessfully d. Can wait a minute or so and buildkit cleans up the 'breaking' snapshot on its own e. Subsequent builds without the forced failure succeed just fine, both with / without changes to the layers and with / without existing cache manifests in ECR
All in all... seems good to go! I'll add a comment to the linked PR pointing to this comment to indicate that it should likely close this issue. :-)