tilt icon indicating copy to clipboard operation
tilt copied to clipboard

image garbage collection in dev clusters

Open nicks opened this issue 4 years ago • 7 comments

In the #tilt channel, Dan C-P writes:

We seem to be seeing bloat in our tilt-managed kind control plane that I've narrowed down to what I think are dangling images (or whatever) that build up over time.

This problem is probably unique to Tilt, because Tilt can create a lot of temporary images.

When I dug around a bit, I saw a comment that implied that image garbage collection is disabled in Kind

https://github.com/kubernetes-sigs/kind/blob/master/pkg/cluster/internal/kubeadm/config.go#L231

I'm not totally sure yet how/if tilt (or ctlptl) should address this. Need to talk to Kubernetes experts who know more about the expected interop here than I do. You could imagine a controller that went into the cluster and deleted these images, similar to Tilt's local garbage collector

nicks avatar Feb 19 '21 21:02 nicks

I verified that execing into the node like

docker exec -it kind-control-plane bash

and running

crictl rmi --prune

can sometimes fix a lot of problems

nicks avatar Feb 19 '21 21:02 nicks

here's a good discussion of this problem in more detail: https://github.com/kubernetes-sigs/kind/issues/735

feels like this also ties into https://github.com/tilt-dev/tilt/issues/2102

in that tilt knows (in some cases) that an image is ephemeral and isn't going to be used again, but needs a way to propagate that to everyone who needs to know

nicks avatar Feb 19 '21 22:02 nicks

In our experience, running crictl rmi --prune cleans up A LOT of space held by dangling images in the control plane, and is exacerbated by long-lived kind clusters.

We recently implemented a feature to run the in-control-plane pruning on tilt down a maximum of every four hours. It's still early but I'm hopeful we see promising returns.

djcp avatar Feb 25 '21 18:02 djcp

Is there a way to log out old EXPECTED_REFs?

i think tilt should have a custom_prune counterpart to custom_build

majidaldo avatar Nov 07 '21 18:11 majidaldo

@majidaldo for what it's worth, the existing garbage collector doesn't look at old expected refs. it just searches the image index for images build by tilt. The code is here:

https://github.com/tilt-dev/tilt/blob/master/internal/engine/dockerprune/docker_pruner.go#L144

(There's not really a strong reason why it needs to be part of Tilt itself, it could pretty easily be run as an extension. You can read the images currently used from the tilt api as tilt describe imagemap, to ensure you're not deleting an in-use image)

I think what we should probably do is move docker-pruner into a separate repo, port it to use https://github.com/google/go-containerregistry (a library for interacting with different container image indexes), and have tilt periodically run it against every image store it knows about (both the local docker image store and any registries it pushes to and to the Kind CRI)

nicks avatar Nov 08 '21 19:11 nicks

FWIW as mentioned upstream I'd like to see kind handle this but it's tricky, kubelet's existing GC isn't terribly well suited to kind.

There's maybe some layering violations but it's possible that tilt could leverage better information here since AIUI it could theoretically know when it has loaded a newer version of an image and periodically request deleting older versions ... 🤔

BenTheElder avatar Aug 08 '22 18:08 BenTheElder

ya, tilt attaches a bit more metadata to the image about "why" you're building it. We already do GC in the local image store, we just need to augment it a bit to tell kubernetes what images we think should be cleaned up.

nicks avatar Aug 09 '22 02:08 nicks