gitea icon indicating copy to clipboard operation
gitea copied to clipboard

Automatically clean up docker images in the registry without a tag pointing to them

Open kolaente opened this issue 3 years ago • 19 comments

When pushing new docker images for an existing tag, the old image still exists and uses up storage one the server. While you can use images just by pointing to their sha, I've yet to find someone who actively uses that. For my own registry (portus) I have a cron job to automatically remove everything that does not have a tag pointing to it. Docker even has a command for this.

Having a cleanup job like that would allow to keep old versions but still solve the storage space problem.

@KN4CK3R in https://github.com/go-gitea/gitea/issues/21658#issuecomment-1301794468:

No, only if it's "older than" or not included in the "keep pattern". But it should be no problem to add a special logic here because there is already the custom Version == "latest" for containers.

Gitlab has an automatic garbage collection process for this: https://docs.gitlab.com/ee/administration/packages/container_registry.html#removing-untagged-manifests-and-unreferenced-layers

I think it's best to discuss this before implementing, mostly regarding these open questions:

  1. Should this be enabled automatically?
  2. Should this be a repo/org setting or a global config one?

kolaente avatar Nov 03 '22 11:11 kolaente

Just for clarification, a repo has no impact on packages:

Should this be a ~~repo~~user/org setting or a global config one?

I checked again how I implemented this and currently there are no untagged images in the container registry! (Exception: If you upload a multiarch image, the different arches are untagged images) If you tag and push an image you can later pull that image with the tag and its hash. If a tag gets pushed again the old tag/version gets removed and that deletes the hash reference too. So after that operation there is no untagged image available anymore.

https://github.com/go-gitea/gitea/blob/f17edfaf5a31ea3f4e9152424b75c2c4986acbe3/routers/api/packages/container/manifest.go#L309-L312

So at the moment the cleanup does not need to remove untagged images because there are none. The question should first be "Should Gitea keep untagged version?"

KN4CK3R avatar Nov 03 '22 12:11 KN4CK3R

Use case sounds pretty similar to git gc which we already automatically run as a cron IIRC.

Should this be enabled automatically?

If it's stable, I'd say so.

Should this be a repo/org setting or a global config one?

I think global is sufficient. Ideally it should just be another cron to cleanup orphaned images, like we already do for orphaned git commits via git gc.

silverwind avatar Nov 03 '22 14:11 silverwind

I've came across this issue after experiencing the same effect. Building multiarch images when only the manifest is tagged, left me with lots of "packages" behind with only the digest (the manifest had only one copy since it was tagged).

Tagging each arch so it gets overwritten makes the "details" tab a bit impractical when you have too much different arch and versions (for matrix builds).

Should this be a repo/org setting or a global config one?

In my case, I would be happy with the exact same global mechanism described (similar to the cron that runs git gc)

theodiem avatar Jan 05 '23 20:01 theodiem

I am also looking for a similar feature, going out of my way to manually prune images is painful.

salasrod avatar Feb 03 '23 04:02 salasrod

Doesn't #21658 resolved the issue?

lunny avatar Feb 03 '23 05:02 lunny

@lunny I didn't test it but I don't think so. The PR allows to configure rules for removal of tags, I just want to remove every image layer not associated with a tag.

kolaente avatar Feb 03 '23 08:02 kolaente

Is this still happening?

peiwenxu avatar Sep 19 '23 12:09 peiwenxu

No, I have 1.20.4 running and it does not happen.

Am 19. September 2023 14:06:27 MESZ schrieb Peiwen Xu @.***>:

Is this still happening?

-- Reply to this email directly or view it on GitHub: https://github.com/go-gitea/gitea/issues/21673#issuecomment-1725380028 You are receiving this because you are subscribed to this thread.

Message ID: @.***>

jum avatar Sep 20 '23 21:09 jum

No one has implemented this yet, but it's definitely a vital feature to conserve disk space.

Maybe it should be disabled by default to support pulling image by hash, which is a rare, but valid use case.

silverwind avatar Sep 21 '23 19:09 silverwind

Does anyone tried this cleanup rule?

image

c521wy avatar Jan 27 '24 12:01 c521wy

Does anyone tried this cleanup rule? image

Using that and then checking with the preview yields no results, does not look like its working.

kolaente avatar Jan 30 '24 12:01 kolaente

It looks like the official docker registry implementation uses this function to find and remove all untagged layers, as described here.

@KN4CK3R As far as I understood from glancing over the code, Gitea does not just "embed" the official registry package, so it's not as easy as just copying or calling that function?

kolaente avatar Jan 30 '24 12:01 kolaente

I'm facing the same issue with the latest version of gitea

mhkarimi1383 avatar Mar 27 '24 05:03 mhkarimi1383

Does anyone tried this cleanup rule?

image

The following seems to work perfectly! It deletes all images that do not have an associated tag with them. I would just suggest using ^sha256:.+ instead, as you could otherwise match a tag that for some reason has sha256 in the middle.

ViRb3 avatar Jul 27 '24 21:07 ViRb3