docker-registry icon indicating copy to clipboard operation
docker-registry copied to clipboard

Implement a script to garbage collect orphan layers

Open samalba opened this issue 11 years ago • 10 comments

It would be nice to have a script that uses the storage lib to iterate the dataset in order to find orphans and clean them.

It's important to have a simulate mode that don't delete anything and display the amount of data saved.

samalba avatar Jan 28 '14 17:01 samalba

Is there currently some work on that, it would be really useful :+1:

Soulou avatar Mar 21 '14 14:03 Soulou

Same here, anyone have a work in progress on that?

lcarstensen avatar Mar 27 '14 06:03 lcarstensen

Instead of periodically checking for orphans, I'd expect it would be easier to refcount images and remove them when their reference count drops to zero. You'd need some temporary code to fill in the initial refcounts for folks with existing storage instances that haven't been refcounting, but you could remove that migration code after a suitable transition period.

wking avatar Mar 27 '14 16:03 wking

I agree refcounting is better. I was about to do a small script to check our prod dataset and estimate the size of orphans at least. But no work in progress right now on the opensource side.

samalba avatar Mar 27 '14 17:03 samalba

On Thu, Mar 27, 2014 at 10:02:47AM -0700, Sam Alba wrote:

I agree refcounting is better.

I think this overlaps with #7. I think we should consolidate into a single “remove refcounted images” issue, and don't mind if it's this one or #7.

wking avatar Apr 16 '14 23:04 wking

In case you come here looking for a script to clean up your private repository right now, here's the script that we have to look for unused images and report how much space is taken.

It leaves a file in /tmp with all the unused images. You can use that to perform the deletion of images. I didn't want to automatically delete things, so it should be safe to run. :-) Caveat Emptor, and all that.

shepmaster avatar May 09 '14 15:05 shepmaster

@shepmaster thanks for the script! Did you have any luck actually removing images deemed unused by your script? I just ran it and I am planning to remove the orphan images, but I am guessing _index_images also needs to reflect the deletions?

bjaglin avatar Jun 02 '14 22:06 bjaglin

@bjaglin I never actually deleted all of the images reported. I modified the script to just focus on a single repository and moved those images to another directory for a while (as good as deleted, but I could restore if something went horribly wrong). That worked fine. I also cleaned up _index_images as described in #7

shepmaster avatar Jun 03 '14 13:06 shepmaster

For the record, to reclaim space I am currently experimenting with https://gist.github.com/bjaglin/1ff66c20c4bc4d9de522 which:

  1. uses a @shepmaster's script to identify orphan images
  2. remove references to these images in the _index_images repo indices
  3. remove the actual images on disk

DISCLAIMER: Use it at your own risk though, as it might break some invariants otherwise enforced by the registry, and is heavily dependent on the implementation details of the registry (version 0.7.0 at the time of writing).

bjaglin avatar Jun 03 '14 15:06 bjaglin

+1

xiakunhou avatar Jun 25 '15 08:06 xiakunhou