skopeo icon indicating copy to clipboard operation
skopeo copied to clipboard

force copying an image

Open rittneje opened this issue 2 years ago • 16 comments

From what we've observed, skopeo copy is smart and won't actually copy anything if the same image already exists at the destination. While this is desirable 99% of the time, we currently have a need to forcibly copy over the image anyway. Is there any way to do this?

rittneje avatar Jan 13 '23 04:01 rittneje

Thanks for reaching out.

What do you mean by forcibly copy, and what are you actually trying to do? Is the goal to avoid layer reuse and to copy exactly the original layers? To somehow re-upload data that is corrupt on the destination? Something else?

Note that at least with the reference registry implementation, re-upload does nothing to fix pre-preexisting corrupt data: https://github.com/distribution/distribution/blob/362910506bc213e9bfc3e3e8999e0cfc757d34ba/registry/storage/blobwriter.go#L310-L314 .

mtrmac avatar Jan 13 '23 12:01 mtrmac

We are copying images to AWS ECR. ECR has a feature where it will scan images for vulnerabilities when you push them. However, it also has a misfeature where once the image is old enough, the scan result expires. According to AWS, once this happens, the only way to get it to scan again is to re-push the image. But since skopeo copy seems to be a no-op, we are stuck.

rittneje avatar Jan 13 '23 12:01 rittneje

And that keys off of an upload of every individual layer, separately? The config? In Skopeo, redundant layer and config uploads are avoided, but manifests are always re-uploaded (except by skopeo sync, which assumes unchanged destinations).

Historically not avoiding the layer and config uploads was not optional because the reuse behavior seems unspecified, but on a second look that lack of specification might not actually be an issue — we assume automatic reuse for blobs that are being compressed already.

Does removing the blob info cache (see debug log for location) make a difference?

mtrmac avatar Jan 13 '23 12:01 mtrmac

Unfortunately, AWS does not specify that level of detail. They just say you have to re-push. 😕

Is the blob info cache on the client side? We are pushing from within an ephemeral container, so it would not have any state preserved between calls to skopeo copy.

I might have to resort to skopeo delete + skopeo copy and hope for the best.

rittneje avatar Jan 13 '23 15:01 rittneje

Yes, the blob info cache is client-side. It typically makes a difference when the push modifies data (e.g. when pushing a build output, and compressing it in the process); not when copying unmodified images around.

skopeo --debug copy should be fairly verbose about the HTTP requests it makes.

I’m open in principle to adding a tunable, but for that we need to understand what to tune.

mtrmac avatar Jan 13 '23 15:01 mtrmac

@mtrmac I've asked AWS Support if they can provide more details on what is specifically required. I'll let you know what they say.

rittneje avatar Jan 16 '23 04:01 rittneje

They say we need to re-push the "complete" image. I guess that means all the layers?

Also, I noticed something odd. I just ran skopeo copy to copy from the ECR image to itself for testing purposes, and it logged several lines of the form Copying blob dd1a79fb6ea3 skipped: already exists. However, when our build job copies from Artifactory to ECR, it instead logs Copying blob sha256:dd1a79fb6ea3e89d51a4e210777fdc20a6a65c5deb9226774e6b1ac94367c67b. But based on the build time it definitely seems like it is skipping. Do you know what could be causing this discrepancy? Also why is one log using the short digest and the other the full digest?

rittneje avatar Jan 16 '23 13:01 rittneje

Compare the --debug log.

mtrmac avatar Jan 16 '23 14:01 mtrmac

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Feb 20 '23 00:02 github-actions[bot]

@mtrmac What if I need to push same blob but with different image or tag? It refuses: Copying blob 090d1abdb0e8 skipped: already exists

MrFoxPro avatar Jul 30 '23 07:07 MrFoxPro

@MrFoxPro How does that make a difference? The blob exists on the registry, which is the effect you need, isn’t it?

If you want to push to a different tag, push to a different tag. This reuse just means that push is faster and uses less CPU, disk and network bandwidth.

Or is this also related to the AWS scanning trigger?

mtrmac avatar Jul 31 '23 14:07 mtrmac

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Aug 31 '23 00:08 github-actions[bot]

@rittneje , Did you manage to find a solution for your problem? We are running into the same issue when trying to enable AWS Inspector Enhanced scanning on older images.

MarienL1995 avatar Oct 11 '23 09:10 MarienL1995

Is it know what exactly needs to happen to trigger the AWS behavior?

I don’t know whether we would want to solve this by adding an option to the transport / CLI, or by building a separate (simple, slow) “upload all blobs” tool — but the first step needs to be understanding what makes the difference.

mtrmac avatar Oct 11 '23 13:10 mtrmac

@MarienL1995 We used the AWS CLI to delete the image (aws ecr batch-delete-image), and then skopeo to re-push it. That was enough to cause it to re-scan.

rittneje avatar Oct 11 '23 19:10 rittneje

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Nov 12 '23 00:11 github-actions[bot]