image Race condition when "skopeo copy" multiple tags into the same oci:directory at the same time

I'm not sure what guarantees does skopeo give with regard to races. See:

marek@mrnew:/tmp$ (skopeo copy docker://registry.fedoraproject.org/fedora:30 oci:image:30  &); (skopeo copy docker://registry.fedoraproject.org/fedora:32 oci:image:32 &); (skopeo copy docker://registry.fedoraproject.org/fedora:33 oci:image:33)

... wait for them to finish...

marek@mrnew:/tmp$ jq . < image/index.json |grep name
        "org.opencontainers.image.ref.name": "latest"
        "org.opencontainers.image.ref.name": "31"
        "org.opencontainers.image.ref.name": "33"

I would expect to see the tag "32" there as well, but I presume it raced with other downloads. Is it expected? Is it okay to run multiple "skopeo copy" into "oci:dir" at the same time?

Aug 18 '20 17:08 majek

Thanks for your report.

Handling concurrent writes hasn’t been an explicit design goal so far, and is non-obvious to achieve in general on Linux (mandatory file locking is not available, advisory file locking is up to individual implementations, the usual temp file + rename trick breaks even that).

Basically c/image would have to invent its own private locking schema for oci: directories, and hope that there isn’t any other concurrent writer.

Worse, there’s a design dichotomy between locking for the full duration of an operation (in which case the above series of copies would get no speed-up to speak of) and locking only for individual file writes (which would work for a group of add-only writers but could break pretty badly once something like #993 is added — blobs could be removed before an image is finished being written). There’s probably a way to design locking / in-progress state to support both fast concurrent writers and safety against concurrent deletes — but is that complexity really worth it?

So, at this point, I’d recommend serializing the Skopeo invocations; or maybe, if the goal is to transfer images using a file system, run a temporary docker/distribution server, copy images there, and transfer the backing storage of the server. That would ~avoid the concurrent delete problem (because there isn’t a single index to serialize, and deletes are not enabled there either :) ) and more importantly preserve the original representation+digests of the images, not forcing a conversion to OCI.

Aug 18 '20 19:08 mtrmac

Git is a great example of concurrent access, synced by disk, done right - so it is possible. "temporary distribution server" -> suggestions?

Aug 31 '20 10:08 majek

podman run -p 5000:5000 registry:2 with an appropriate storage volume, or the out-of-container equivalent.

Aug 31 '20 18:08 mtrmac

Worse, there’s a design dichotomy … could break pretty badly once something like #993 is added

After https://github.com/containers/image/pull/2003 , we do now support deleting images from an oci: destination. So any implementation would need to handle that.

Jan 02 '24 16:01 mtrmac