trow icon indicating copy to clipboard operation
trow copied to clipboard

Image Deletion & Garbage Collection

Open amouat opened this issue 5 years ago • 3 comments

This is a fraught topic that other registries have struggled with. It's important to get right.

What happens at the minute with regards to deletion isn't correct, but was done to pass the original OCI tests. Currently the DELETE endpoint works, and will happily delete whatever blob is behind it, regardless of whether it is currently used in an image or not. The reasoning behind this was apparently that a user might need to quickly removed a blob that shouldn't have been uploaded e.g. contains sensitive information. Currently there is no "GC" so there can be blobs around that are not referenced by any manifest; it's entirely up to the user to delete blobs.

A better way to do things (and I think this is what some registries do) is to have deletes work on manifests only. Once all tags pointing to manifest are deleted (or perhaps if the digest is used?) the manifest blob is deleted. Any blobs referenced by the manifest that are not referenced by another manifest should also be deleted. This is something that is tricky to get right; there are race conditions etc and explains why most registries do a GC sweep when nothing is happening. I would much rather avoid a pause the world GC sweep and tidy things up as they happen, but it's quite possible I don't understand the issues well enough and this isn't possible.

amouat avatar Aug 21 '20 09:08 amouat

Hi there, I ran into this issue with my own trow registry and its ever-growing disk volume size, so I built a small GC script compumike/trow-garbage-collector which we are now running in production.

It looks at only the first line of /data/manifests/**/* (i.e. it considers only the latest version of the manifest for every tag), marks and reads the manifest JSON from /data/blobs/, marks all of the referenced config and layer blobs. After completing this mark step for all tags in all images, the script then deletes from /data/blobs/ any remaining blobs which are not-in-use anywhere.

It addresses virtually all real-world race conditions (see MIN_GC_BLOB_AGE in README.md).

It's not perfect or elegant, but it's good enough for us, and we no longer have to think about trow's disk space growing and growing due to long-unused image layers.

Thank you for building trow!

compumike avatar Apr 15 '21 01:04 compumike

Thanks!

That looks amazing @compumike. I'll take a look :)

It is something high on my list to sort out and I'm disappointed we've not got to it yet. There is a lot of refactoring work going on currently that will hopefully make this easier.

amouat avatar Apr 16 '21 11:04 amouat

This,

Trow looks quite awesome but it badly needs a GC before it is usable in most applications.

This should include:

  • Blob GC (removal of non-referenced blobs)
  • Tag GC (removal of old tags according to configured LRU $date threshold)

This would allow disk usage to stay within reasonable bounds.

splitice avatar Jan 25 '22 03:01 splitice