tilt icon indicating copy to clipboard operation
tilt copied to clipboard

Registry cleaning

Open gaetansnl opened this issue 6 years ago • 22 comments

Hello, First I wanted to thank you for your work on tilt. I tried multiple tools before, but most of them don't support custom scripts and such advanced sync capabilities... I can now rebuild dependencies on each server without a full server rebuild 😄

I build images locally, and push them to a registry on another server. And I noticed that they are not removed. It seems they are also kept locally. So I'm afraid to explode the server storage capacity even if most of time the image use previous images. I just got "Kubelet has disk pressure"

It here a suggested approach to remove some of them ? Or should I write a script manually ?

Thank you

gaetansnl avatar Aug 23 '19 16:08 gaetansnl

Thanks for the kind words!

I don't know of a way to do this right now. Most heavy users of Tilt use live_update (https://docs.tilt.dev/live_update_tutorial.html). It copies files and runs build commands in-place, so you don't run out of disk space copying around lots of short-lived images.

I'm sure a garbage collector would help others. But I'm not totally sure what it might look like. Maybe something that deletes all image tags with the "tilt-" prefix, then runs the registry garbage collector?

nicks avatar Aug 24 '19 00:08 nicks

I use live_update but a new image is created when I restart tilt, or when I do changes in package.json for example. I know skaffold prune images but I'm not sure how they are pruning, because I still have a lot skaffold images.

I was thinking it can be nice to have, at any time, only one tag per docker_build(). And don't remove it when tilt is closed. So for the next start: 1) the cache is still there 2) old tags are removed. Removing by "tilt-" prefix will work locally but can maybe cause issues if multiple developpers are using the registry ?

gaetansnl avatar Aug 24 '19 06:08 gaetansnl

To add to this, for me at the very least, I found that using live_update wasn't worth it with Rust as the performance difference is negligible and it would require me to have a completely different build pipeline for development. Because of that, I always rebuild images on each change and have been hit with disk pressure quite a lot.

dotellie avatar Aug 27 '19 13:08 dotellie

Hi all, sorry for the late follow-up, but do you happen to know what k8s cluster you were running on when you hit these errors? @dotellie @gaetansnl

maiamcc avatar Oct 03 '19 15:10 maiamcc

@maiamcc No worries! I assume you're asking for minikube in my case? I have basically as good as default settings running on Linux with KVM if that helps.

dotellie avatar Oct 03 '19 15:10 dotellie

👋 hey folks! Tilt now has a built-in docker prune-er.

By default. the docker pruner runs once after startup (as soon as all of your pending builds are done) and once every hour thereafter, and prunes

  • stopped containers built by Tilt that are at least 6 hours old
  • images built by Tilt and associated with this Tilt run that are at least 6 hours old
  • dangling build caches that are at least 6 hours old However, you can configure all of these settings (i.e. can set max age of containers/images/caches to keep, set interval at which the pruner runs, and alternately, set it to run every X builds).

This is out in the latest release, hopefully it'll help with @dotellie's issue -- tho @gaetansnl it seems like yours is potentially different and won't just be solved by docker images prune, is that correct?

maiamcc avatar Oct 21 '19 16:10 maiamcc

Hello. Thank you for this feature 🎉 , I'll try it as soon as I can. It will at least partially solve the problem, If I understand correctly, images inside the registry are not removed. But it will at least avoid disk issues localy

gaetansnl avatar Oct 22 '19 07:10 gaetansnl

This becomes a bigger issue when you're using a local registry set up by ctlptl, because those images never get cleaned up :\

nicks avatar Dec 07 '20 15:12 nicks

Not entirely sure if this is a recent development or not, but it seems that using a local registry is recommended/default for most setups (kind, k3d, minikube) — see examples in ctlptl README. And indeed, stuff in the local registry isn't being cleaned up: it isn't even possible to delete stuff from the local registry in its default configuration, and even if it was, a manual blob garbage collector invocation would be necessary afterwards. No wonder tilt doesn't attempt to clean up anything in the local registry.

On the other hand, tilt's FAQ claims that images are built in-cluster if possible. This certainly isn't the case with kind and a local registry, but the FAQ entry mentions Minikube as well. https://docs.tilt.dev/choosing_clusters.html#minikube says local registry is supported there, so I'm wondering if perhaps the FAQ entry is outdated? Are there any benefits of using a local registry instead of building in-cluster that I'm missing? Is this a speed vs space tradeoff?

Anyway, assuming deleting and blob GC in the registry itself is enabled (https://github.com/tilt-dev/ctlptl/issues/247), I think what tilt could do is this:

  • after doing its local docker pruning, gather all the tags that are still there
  • drop any tags from the registry that aren't in the local docker (as these had presumably been pruned earlier)
  • trigger GC of blobs in the registry if this needs to be triggered manually

liskin avatar Aug 22 '22 18:08 liskin

re: "I'm wondering if perhaps the FAQ entry is outdated" - the minikube team completely broke insecure registries in v1.26 - https://github.com/kubernetes/minikube/issues/14480 . This is very recent. I've been telling people to use v1.25 until it's fixed upstream.

nicks avatar Aug 22 '22 18:08 nicks

Hey all, bumping this as it's been over a year since the last comment on this thread.

I'm using ctlptl to create a registry and cluster.

apiVersion: ctlptl.dev/v1alpha1
kind: Registry
name: neosync-dev-registry
---
apiVersion: ctlptl.dev/v1alpha1
kind: Cluster
product: kind
registry: neosync-dev-registry
name: kind-neosync-dev # must have kind- prefix

But roughly every few days or so (+-, depending on how much active development I'm doing) I have to nuke my cluster and remove all of the images in docker to free up space.

I run this command: docker rmi -f $(docker images -aq) once everything is shut down to remove all images. Then I go into docker desktop and reclaim space using the "Disk usage" extension to further clean up my drive.

Seems @liskin is dealing with the same issue that I'm having and I'm wondering if there is any updates on how to get this rectified? I love using tilt and have been using it for some time, but this is probably my biggest pain with it as of today.

nickzelei avatar Oct 25 '23 00:10 nickzelei

If using podman, this will remove all images that have tags beginning with tilt-:

podman rmi -f $(podman image ls --format '{{.ID}}' '*:tilt-*')

ianb-mp avatar Mar 14 '24 01:03 ianb-mp

I just wanted to add how this issue affects our team. I've been monitoring my Docker Desktop (on mac) environment over time from "fresh" starts to try to figure out why disk space gets sucked up. We use Tilt with minikube and the docker registry (as mentioned in a previous comment) so Docker Desktop is running a container for the registry and one for tilt.

I've noticed that over time, even if I'm regularly doing various forms of prune:

docker system prune -a --volumes
docker volume prune -a
tilt docker-prune

That eventually all the disk allocated to docker desktop is consumed to the point where I can't rebuild a container on change without getting "no space" error. Note that the last two don't really do anything if I do the first one but I include it for completeness. My usual procedure once I start getting frequent "no space" issues despite doing a prune is to just restart it all: shut down tilt, stop its container, remove its container, stop and remove the local docker registry container and then run all the prunes. Then start up the whole thing again, let it rebuild our application images and then go back to whatever I was doing. Unsurprisingly this starts with plenty of free space (even after the rebuild of our application images).

I decided to not restart the entire setup today when I was getting to no space issues. Instead, I first did a normal prune which reclaimed a fair bit of space but not all the way up to how much space is available after a "fresh" start. Then I exec-ed into the tilt container and ran a bunch of du -sh * commands on various directories to suss out where the tilt container was using up disk. Eventually I realized it was /var/lib/containerd/ and specifically /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs.

So I ran a containerd prune (crictl rmi --prune) from within the tilt container. This deleted a lot of stuff, but our application containers were still running fine, and it got the available disk space (as reported by Docker Desktop) back up to where it starts at after a "fresh" start of our Tilt setup. However, it also deleted a lot of stuff I wouldn't want it to – all the intermediate layers for the current "active" images, so the next rebuild of an application image missed cache. But at least I didn't have to restart everything or lose the images stored in the local docker registry (which is mostly the base images downloaded from external places). So I think tilt is keeping intermediate layers for image versions that are no longer being used and they start to add up over time. Many of the layers were many days old so I don't think it's just that our use of default docker_prune_settings was keeping around just all the layers for the 2 most recent builds & within 6 hours. I especially don't think it's just prune settings since prune over time (may days) running tilt reclaims less and less. Nonetheless, I am setting docker_prune_settings max_age_minutes so it only keeps around an hour worth vs 6 hours by default.

ludwick avatar Mar 20 '24 00:03 ludwick

Running out of space has been an issue for our team as well.

We've settled on periodically deleting everything in the registry and pruning everything, so we just run something like the following:

tilt down
docker exec -it -u root ctlptl-registry sh -c 'rm -r /var/lib/registry/docker/registry/v2/repositories/*; registry garbage-collect /etc/docker/registry/config.yml'
docker restart ctlptl-registry
docker exec -it kind-control-plane crictl rmi --prune
docker system prune -af
tilt up

Depending on your setup, this may not be safe. Not sure if we're making some bad assumptions, but it seems like there's some copy of data in Docker, Kind, and the local registry, so this tries to nuke the images (and CAUTION, potentially more in Docker).

wuservices avatar Apr 09 '24 20:04 wuservices

re: "I'm wondering if perhaps the FAQ entry is outdated" - the minikube team completely broke insecure registries in v1.26 - kubernetes/minikube#14480 . This is very recent. I've been telling people to use v1.25 until it's fixed upstream.

This seems like a misunderstanding, I wasn't concerned about insecure registries at all, but rather about building in cluster. If I understand it correctly, a local registry isn't needed at all if images are built in cluster, as they're immediately available for k8s to run and don't need to be copied all around (push from docker to registry, then pull from registry to k8s).

The https://docs.tilt.dev/faq.html#q-all-the-tilt-examples-store-the-image-at-gcrio-isnt-it-really-slow-to-push-images-up-to-googles-remote-repository-for-local-development FAQ entry implies that building in cluster is desirable (faster!) and is the norm, but https://docs.tilt.dev/choosing_clusters confusingly talks a lot about local registries, making it seem that somehow not having to push/pull over the internet and only copying several gigabytes of data locally is all we can hope for.

So yeah I suppose "outdated" wasn't worded well. Should have gone for "confusing".

Anyway, now that almost 2 years have somehow managed to pass by while I wasn't looking (and I mean that quite literally, life happened…), the entry happens to be outdated indeed:

  • setting ImageNeverPull is not a thing any more – https://github.com/tilt-dev/tilt/pull/6277
  • additional products (Rancher, Colima, Orbstack) support building in-cluster if configured right – https://github.com/tilt-dev/tilt/blob/37be1ded69d09e97791335ef6cf2bacd2bbf1ebb/internal/docker/env.go#L350-L419

And yeah, now that I got back to this and dug a bit deeper to be able to answer, I realised that we (the company I work for where we happen to use tilt) don't configure stuff right, so despite using Colima, images aren't built in cluster, but instead get copied to and from a local registry, and therefore we run into this very problem discussed in this issue (https://github.com/tilt-dev/tilt/issues/2102#issuecomment-739984779). Perhaps for no good reason, because if we just used colima --kubernetes, everything would be fine? (well, I've been on Linux the whole time, just running kind in a rootless docker, so it wouldn't be fine for me, but it could be fine for everyone else)

https://docs.tilt.dev/choosing_clusters could use an entry for Colima btw… And for Orbstack :-)

liskin avatar Jun 13 '24 15:06 liskin

Seems @liskin is dealing with the same issue that I'm having and I'm wondering if there is any updates on how to get this rectified? I love using tilt and have been using it for some time, but this is probably my biggest pain with it as of today.

Apologies for the late reply, but there hasn't been much news so perhaps I might have something useful still.

First, as I realised halfway through writing the above reply to Nick, one thing is that we're all probably using Tilt wrong. Apparently it can be used without a registry, building images directly in the k8s cluster, which saves space and time and also possibly makes this whole garbage collection problem go away. So if one doesn't need a registry for some other reason (and I don't know what that reason might be but I'm somewhat certain we don't have one), then configuring things so images aren't pushed/pulled to/from a registry would be the best way to avoid this problem.

If you, however, do run a local registry (like I do, because I happen to use ctlptl with kind on Linux, mostly because of being completely clueless about k8s and wanting to keep it in an isolated box as much as possible, so letting it use my host's docker is certainly not an option), then… well then you could do all sorts of weird shenanigans like we resorted to doing because we had no idea we could have avoided it.

Like… (and I'm not making this up, we really do all this)

  • docker exec into the kind container and crictl ps to find all the images that are currently in use and crictl rmi those that aren't (which k8s should do itself but maybe doesn't or doesn't enough or whatever I don't know)
  • echo the list of used images out of that docker exec crictl ps … so we know what needs to stay in the local registry
  • use skopeo list-tags and skopeo delete to remove everything that shouldn't stay
  • docker exec ctlptl-registry registry garbage-collect /etc/docker/registry/config.yml because there's no API for actually removing the data from disk (of course, why would there be, docker registries are meant to run with infinite storage, why would you assume otherwise?)
  • docker image ls and docker rmi because while tilt does try to garbage collect images it builds it fails to do it correctly (https://github.com/tilt-dev/tilt/issues/4596) so images pile up anyway

So yeah it's dumb and I'm maybe a bit bitter about it. Now let me go and find out how to switch to building in cluster so I can forget about all this. :-)

liskin avatar Jun 13 '24 16:06 liskin