nixery icon indicating copy to clipboard operation
nixery copied to clipboard

Docker occasionally re-pulls layers it already has

Open tazjin opened this issue 6 years ago • 5 comments

There are some cases where the Docker client will pull layers that it already has (as in the same hash is already part of one of its other images) again. It seems like it uses more than just the hash to determine whether a layer needs pulling (huh?), need to dig into this.

tazjin avatar Aug 17 '19 09:08 tazjin

Docker downloads layers using the LayerDownloadManager, implemented in moby/distribution/xfer/download.go.

The logic in Download deals with skipping layers that already exist, but I'm still working my way through it.

tazjin avatar Aug 17 '19 11:08 tazjin

This is probably due to https://github.com/moby/moby/issues/38446.

My theory is that this is a leftover in Docker from the era before content-addressable image layers, where the order actually mattered.

tazjin avatar Oct 03 '19 17:10 tazjin

From what I remember, the docker engine (moby?) still doesn't store layers in the Registry v2 sense, but in the ordered v1 model, where each layer contains "the whole layer stack so far". CoW (via OverlayFS or whatever else) is set up per layer stack...

nightkr avatar Feb 13 '20 08:02 nightkr

For a bit of perspective - if I remember correctly, the initial versions of Docker back in 2013 did store the layers separately, and used AUFS to merge all the layers (and add the read-write layer on top of it). Later (2015ish maybe?) when support for other storage drivers was added, this changed, because other union filesystems (like overlay and overlay2) don't support merging many layers like AUFS does. So when an image is pulled, it gets materialized into a single read-only layer.

My knowledge of Docker internals is out-of-date, so I don't know if the engine keeps the blobs that it downloads after having unpacked an image. If it does, this should be an easy fix; otherwise it might be more complicated because there might be some resistance given that this would increase disk usage :/

jpetazzo avatar Dec 24 '21 16:12 jpetazzo

Hello there, I stumbled upon this and investigated a bit more. Nowadays Docker supports containerd as the underlying image store. That one does properly reuse layers regardless of their order 🎉!

Not sure if it's relevant to this project, but posting here just in case.

aochagavia avatar Feb 29 '24 08:02 aochagavia