buildkit icon indicating copy to clipboard operation
buildkit copied to clipboard

docker buildx prune --filter="until=xyz" marks unrelated cache layers as "last used" and does not delete parents

Open dimikot opened this issue 1 year ago • 6 comments

Description

When running e.g.

docker buildx prune --filter="until=25s" 

after deleting some cache layers, the parents of those cache layers are marked as "last used". This does not let it prune the entire subtree: instead, it prunes only one leaf layer at a time.

Reproduce

This is actually very hard to reproduce, so I provide a screenshot from some real CI run. I just built a quick python tool which represents the results of du and prune as a tree and adds colors.

I run du. Look at cache id=byc4z0pb2ba29tm25nqbrdcpk (underlined with red) and its parent lac46mmewlr8bqd5f7ii95hgd (underlined with green). They were both last used 3 minutes ago.

Then, docker buildx prune --filter="until=25s" removes the old unreferenced caches, and it removes the red cache byc4z0pb2ba29tm25nqbrdcpk (which is correct). For some reason, it doesn't remove its green parent lac46mmewlr8bqd5f7ii95hgd (although it theoretically should).

And after pruning, I run du again, and look what happened with the green parent lac46mmewlr8bqd5f7ii95hgd (follow the arrows): it is now "Last used 1 second ago"! (Just reminding that, before pruning, it was "Last used 3 minutes ago".) I.e. prune does update the timestamp of the cache it doesn't touch. I think it may also be the reason why it doesn't delete that green parent: since it touches it, it doesn't treat it as "older than 25s".

Image

Expected behavior

  1. On that screenshot, both caches (red byc4z0pb2ba29tm25nqbrdcpk and its green parent lac46mmewlr8bqd5f7ii95hgd) should've been pruned, because they both are older-used than 25s ago. But it pruned only the leaf cache.
  2. Or at least, green parent's lac46mmewlr8bqd5f7ii95hgd timestamp should not be modified at pruning for sure.

docker version

Client: Docker Engine - Community
 Version:           27.3.1
 API version:       1.47
 Go version:        go1.22.7
 Git commit:        ce12230
 Built:             Fri Sep 20 11:41:08 2024
 OS/Arch:           linux/arm64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.3.1
  API version:      1.47 (minimum version 1.24)
  Go version:       go1.22.7
  Git commit:       41ca978
  Built:            Fri Sep 20 11:41:08 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.7.22
  GitCommit:        7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
 runc:
  Version:          1.1.14
  GitCommit:        v1.1.14-0-g2c9f560
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    27.3.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.17.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.7
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 1
  Running: 0
  Paused: 0
  Stopped: 1
 Images: 1
 Server Version: 27.3.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
 runc version: v1.1.14-0-g2c9f560
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.8.0-1016-aws
 Operating System: Ubuntu 22.04.5 LTS
 OSType: linux
 Architecture: aarch64
 CPUs: 16
 Total Memory: 30.75GiB
 Name: bc09d8dcc01d
 ID: 7c8981bc-d183-47c2-8e22-03ad2c38f85e
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

Additional Info

What I'm trying to achieve with all these is to remain only the layer caches related to the latest build, and prune everything else. I.e. remain only the artifacts of the latest, most recent build. Theoretically, docker buildx prune --filter="until=${until}s" should do it (where until = now() - build_start_timestamp), and in fact it seems to do so on e.g. MacOS (docker 27.2.0) with my test Dockerfile. But in practice, probably due to the effect explained above (marking unrelated caches as "recently used" on Linux and with a real heavy Dockerfile), it doesn't work as expected.

I also tried to downgrade to 27.2.0 in Linux (both docker-ce and docker-ce-cli), it didn't help, same effect.

dimikot avatar Oct 13 '24 10:10 dimikot