buildkit
buildkit copied to clipboard
docker buildx prune --filter="until=xyz" marks unrelated cache layers as "last used" and does not delete parents
Description
When running e.g.
docker buildx prune --filter="until=25s"
after deleting some cache layers, the parents of those cache layers are marked as "last used". This does not let it prune the entire subtree: instead, it prunes only one leaf layer at a time.
Reproduce
This is actually very hard to reproduce, so I provide a screenshot from some real CI run. I just built a quick python tool which represents the results of du and prune as a tree and adds colors.
I run du. Look at cache id=byc4z0pb2ba29tm25nqbrdcpk (underlined with red) and its parent lac46mmewlr8bqd5f7ii95hgd (underlined with green). They were both last used 3 minutes ago.
Then, docker buildx prune --filter="until=25s" removes the old unreferenced caches, and it removes the red cache byc4z0pb2ba29tm25nqbrdcpk (which is correct). For some reason, it doesn't remove its green parent lac46mmewlr8bqd5f7ii95hgd (although it theoretically should).
And after pruning, I run du again, and look what happened with the green parent lac46mmewlr8bqd5f7ii95hgd (follow the arrows): it is now "Last used 1 second ago"! (Just reminding that, before pruning, it was "Last used 3 minutes ago".) I.e. prune does update the timestamp of the cache it doesn't touch. I think it may also be the reason why it doesn't delete that green parent: since it touches it, it doesn't treat it as "older than 25s".
Expected behavior
- On that screenshot, both caches (red byc4z0pb2ba29tm25nqbrdcpk and its green parent lac46mmewlr8bqd5f7ii95hgd) should've been pruned, because they both are older-used than 25s ago. But it pruned only the leaf cache.
- Or at least, green parent's lac46mmewlr8bqd5f7ii95hgd timestamp should not be modified at pruning for sure.
docker version
Client: Docker Engine - Community
Version: 27.3.1
API version: 1.47
Go version: go1.22.7
Git commit: ce12230
Built: Fri Sep 20 11:41:08 2024
OS/Arch: linux/arm64
Context: default
Server: Docker Engine - Community
Engine:
Version: 27.3.1
API version: 1.47 (minimum version 1.24)
Go version: go1.22.7
Git commit: 41ca978
Built: Fri Sep 20 11:41:08 2024
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.7.22
GitCommit: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
runc:
Version: 1.1.14
GitCommit: v1.1.14-0-g2c9f560
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client: Docker Engine - Community
Version: 27.3.1
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.17.1
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.29.7
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 1
Server Version: 27.3.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
runc version: v1.1.14-0-g2c9f560
init version: de40ad0
Security Options:
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.8.0-1016-aws
Operating System: Ubuntu 22.04.5 LTS
OSType: linux
Architecture: aarch64
CPUs: 16
Total Memory: 30.75GiB
Name: bc09d8dcc01d
ID: 7c8981bc-d183-47c2-8e22-03ad2c38f85e
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Additional Info
What I'm trying to achieve with all these is to remain only the layer caches related to the latest build, and prune everything else. I.e. remain only the artifacts of the latest, most recent build. Theoretically, docker buildx prune --filter="until=${until}s" should do it (where until = now() - build_start_timestamp), and in fact it seems to do so on e.g. MacOS (docker 27.2.0) with my test Dockerfile. But in practice, probably due to the effect explained above (marking unrelated caches as "recently used" on Linux and with a real heavy Dockerfile), it doesn't work as expected.
I also tried to downgrade to 27.2.0 in Linux (both docker-ce and docker-ce-cli), it didn't help, same effect.