skopeo icon indicating copy to clipboard operation
skopeo copied to clipboard

skopeo sync: Support incremental updates to dir storage

Open lfarkas opened this issue 4 years ago • 21 comments

I'm try to copy a full registry with skopeo and keep it sync. but it's not possible or...at least only once.

for i in `podman search --format "{{.Name}}" $REGISTRY/`; do
	skopeo sync --scoped --all --src docker --dest dir $i $LOCAL_DIR/
done

unfortunately it's work for the first run but not in the second case. it's be very useful to be an overwrite and a delete option. in this case we can overwrite a local directory and keep updated with a given registry. of course i can write a more complicated shell script and it'll work but imho this would be a very useful feature.

lfarkas avatar Mar 31 '21 14:03 lfarkas

Thanks for reaching out. What is the exact error you are getting? Which version of Skopeo are you using?

vrothberg avatar Mar 31 '21 14:03 vrothberg

1.2.2 Refusing to overwrite destination directory

lfarkas avatar Mar 31 '21 15:03 lfarkas

Thanks! Yes, skopeo sync refuses to overwrite existing directories on the host to prevent accidental data corruption. I think we can make the behavior conditional by adding a new flag (e.g., --dest-overwrite={true,false}).

@rhatdan, @flavio WDYT?

vrothberg avatar Apr 01 '21 10:04 vrothberg

Yes, it would be nice to have that, but I'm somehow concerned about the amount of code to introduce.

IMHO a "real" sync would probably require at least:

  • Ability to skip the download of the data that isn't changed
  • Ability to overwrite data that changed
  • Ability to add new data
  • Ability to remove data that is no longer around

It feels like we're reimplementing rsync :sweat_smile:

flavio avatar Apr 01 '21 12:04 flavio

otherwise what's the current use case for sync?

  • I like to sync a whole registry eg: registry.example.com/
  • or a part of it eg. registry.example.com/foo/
  • or a given image with all version: registry.example.com/foo/bar
  • or a given version eg registry.example.com/foo/bar:1.2.3 but in all case if I like to save it ot directory i like to get a resulting directory 100% consistent with the registry image (or that part of the registry which i sync).

anyway this is also the case when the destination is a docker registry.

lfarkas avatar Apr 01 '21 14:04 lfarkas

anyway this is also the case when the destination is a docker registry.

No. This only happens for the dir: transport.

vrothberg avatar Apr 02 '21 10:04 vrothberg

It feels like we're reimplementing rsync sweat_smile

:rofl: I think we should keep it simple. The destination will be removed and recreated (i.e., full overwrite)?

vrothberg avatar Apr 02 '21 10:04 vrothberg

for me it'd be even better the the current behaviour.

lfarkas avatar Apr 02 '21 12:04 lfarkas

@vrothberg You must be a better typer then me. --dest-overwrite={true,false}

How about --force.

rhatdan avatar Apr 02 '21 14:04 rhatdan

I love typing but --force sounds good as well :)

vrothberg avatar Apr 02 '21 14:04 vrothberg

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Jun 03 '21 00:06 github-actions[bot]

my 2c is that making sync do a full overwrite (behind a --force flag or not) would be non-intuitive -- the word sync implies that it should only update the parts that are different.

If a full overwrite would be equivalent to rm -rf DEST || true followed by skopeo copy, then I think it's better for clients to do that themselves rather than invoke a sync command which (confusingly) does that for them.

bduffany avatar Aug 23 '21 19:08 bduffany

I'm try to copy a full registry with skopeo and keep it sync.

I don’t think that’s not quite a use case. A directory tree of image blobs is not useful in itself. What is that set of files used for? Copied back to yet another registry? Something else?

mtrmac avatar Aug 23 '21 19:08 mtrmac

Thanks! Yes, skopeo sync refuses to overwrite existing directories on the host to prevent accidental data corruption. I think we can make the behavior conditional by adding a new flag (e.g., --dest-overwrite={true,false}).

… and then c/image/directory.newImageDestination would just notice a dir:-formatted directory with the right version number and erase all contents, and nothing would be gained right now. So without more work around c/image, that flag would be no better than rm -rf $dest; skopeo sync, AFAICS.

mtrmac avatar Aug 23 '21 19:08 mtrmac

I love typing but --force sounds good as well :)

IMHO --force is a trap; users use it to override one sanity check but end up overriding others they didn’t mean to override (or that will have been added much later in a new version).

Anyway, let’s figure out 1) what is the need, and 2) what do we want to do about that need, before tinkering with the UI of an unknown feature.

mtrmac avatar Aug 23 '21 19:08 mtrmac

Yes, it would be nice to have that, but I'm somehow concerned about the amount of code to introduce. … It feels like we're reimplementing rsync 😅

Yeah, I’m torn. Skopeo originally was a noddy wrapper around c/image with basically nothing to worry about or design; skopeo sync is very different from that idea. OTOH skopeo sync has also been very popular and useful for some users.

Pragmatically, I think clean PRs are welcome, and if contributors or drive-by users want to take it much further than originally anticipated (and the maintainers have the bandwidth to keep up with those PRs), that’s great. (By “clean PRs” I mean not to hand off quick hacks and technical-debt to others to maintain.)

Alternatively, some users may be much better served with calling c/image directly from a much larger program, e.g. a build system / pipeline that does already have a database of artifacts and their known locations. (Or Skopeo would be forked if the maintainers couldn’t do a good enough job, of course.)

E.g. skopeo sync has already caused a contribution of c/image/copy.Options.OptimizeDestinationImageAlreadyExists, which made sync much more efficient and potentially helps other c/image users as well.

mtrmac avatar Aug 23 '21 19:08 mtrmac

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Jun 27 '22 00:06 github-actions[bot]

It seems like a necessary feature to me. Force overwrite seems like a good interim solution. I agree though, the terminology "sync" makes you think it is actually syncing.

wrender avatar Nov 03 '22 20:11 wrender

--force would seem overloaded in my mind. To be honest I would assume it would do things like ignore tls warnings, or ignore failed images, etc.

FruityWelsh avatar Dec 20 '22 22:12 FruityWelsh

Bump: Ran into this fairly hard when working on https://github.com/containers/podman/discussions/19796

Scenario: You need to sync a huge number of images across multiple registry namespaces. It breaks somewhere in the middle or right at the end. Or, something it previously sync'd has become corrupted for some reason or another.

Could skopeo sync be made to do some minimal checking on the destination, and if it's borkd in some obvious way, clobber and re-sync it?

I would also be in support of some kind of --force or --overwrite solution, though less than ideal performance-wise, it would guarantee the "latest" stuff is actually synchronized.

cevich avatar Sep 13 '23 14:09 cevich

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Feb 12 '24 00:02 github-actions[bot]