tilt
tilt copied to clipboard
image garbage collection in dev clusters
In the #tilt channel, Dan C-P writes:
We seem to be seeing bloat in our tilt-managed kind control plane that I've narrowed down to what I think are dangling images (or whatever) that build up over time.
This problem is probably unique to Tilt, because Tilt can create a lot of temporary images.
When I dug around a bit, I saw a comment that implied that image garbage collection is disabled in Kind
https://github.com/kubernetes-sigs/kind/blob/master/pkg/cluster/internal/kubeadm/config.go#L231
I'm not totally sure yet how/if tilt (or ctlptl) should address this. Need to talk to Kubernetes experts who know more about the expected interop here than I do. You could imagine a controller that went into the cluster and deleted these images, similar to Tilt's local garbage collector
I verified that execing into the node like
docker exec -it kind-control-plane bash
and running
crictl rmi --prune
can sometimes fix a lot of problems
here's a good discussion of this problem in more detail: https://github.com/kubernetes-sigs/kind/issues/735
feels like this also ties into https://github.com/tilt-dev/tilt/issues/2102
in that tilt knows (in some cases) that an image is ephemeral and isn't going to be used again, but needs a way to propagate that to everyone who needs to know
In our experience, running crictl rmi --prune
cleans up A LOT of space held by dangling images in the control plane, and is exacerbated by long-lived kind clusters.
We recently implemented a feature to run the in-control-plane pruning on tilt down
a maximum of every four hours. It's still early but I'm hopeful we see promising returns.
Is there a way to log out old EXPECTED_REFs?
i think tilt should have a custom_prune counterpart to custom_build
@majidaldo for what it's worth, the existing garbage collector doesn't look at old expected refs. it just searches the image index for images build by tilt. The code is here:
https://github.com/tilt-dev/tilt/blob/master/internal/engine/dockerprune/docker_pruner.go#L144
(There's not really a strong reason why it needs to be part of Tilt itself, it could pretty easily be run as an extension. You can read the images currently used from the tilt api as tilt describe imagemap
, to ensure you're not deleting an in-use image)
I think what we should probably do is move docker-pruner into a separate repo, port it to use https://github.com/google/go-containerregistry (a library for interacting with different container image indexes), and have tilt periodically run it against every image store it knows about (both the local docker image store and any registries it pushes to and to the Kind CRI)
FWIW as mentioned upstream I'd like to see kind handle this but it's tricky, kubelet's existing GC isn't terribly well suited to kind.
There's maybe some layering violations but it's possible that tilt could leverage better information here since AIUI it could theoretically know when it has loaded a newer version of an image and periodically request deleting older versions ... 🤔
ya, tilt attaches a bit more metadata to the image about "why" you're building it. We already do GC in the local image store, we just need to augment it a bit to tell kubernetes what images we think should be cleaned up.