skopeo force copying an image

From what we've observed, skopeo copy is smart and won't actually copy anything if the same image already exists at the destination. While this is desirable 99% of the time, we currently have a need to forcibly copy over the image anyway. Is there any way to do this?

Jan 13 '23 04:01 rittneje

Thanks for reaching out.

What do you mean by forcibly copy, and what are you actually trying to do? Is the goal to avoid layer reuse and to copy exactly the original layers? To somehow re-upload data that is corrupt on the destination? Something else?

Note that at least with the reference registry implementation, re-upload does nothing to fix pre-preexisting corrupt data: https://github.com/distribution/distribution/blob/362910506bc213e9bfc3e3e8999e0cfc757d34ba/registry/storage/blobwriter.go#L310-L314 .

Jan 13 '23 12:01 mtrmac

We are copying images to AWS ECR. ECR has a feature where it will scan images for vulnerabilities when you push them. However, it also has a misfeature where once the image is old enough, the scan result expires. According to AWS, once this happens, the only way to get it to scan again is to re-push the image. But since skopeo copy seems to be a no-op, we are stuck.

Jan 13 '23 12:01 rittneje

And that keys off of an upload of every individual layer, separately? The config? In Skopeo, redundant layer and config uploads are avoided, but manifests are always re-uploaded (except by skopeo sync, which assumes unchanged destinations).

Historically not avoiding the layer and config uploads was not optional because the reuse behavior seems unspecified, but on a second look that lack of specification might not actually be an issue — we assume automatic reuse for blobs that are being compressed already.

Does removing the blob info cache (see debug log for location) make a difference?

Jan 13 '23 12:01 mtrmac

Unfortunately, AWS does not specify that level of detail. They just say you have to re-push. 😕

Is the blob info cache on the client side? We are pushing from within an ephemeral container, so it would not have any state preserved between calls to skopeo copy.

I might have to resort to skopeo delete + skopeo copy and hope for the best.

Jan 13 '23 15:01 rittneje

Yes, the blob info cache is client-side. It typically makes a difference when the push modifies data (e.g. when pushing a build output, and compressing it in the process); not when copying unmodified images around.

skopeo --debug copy should be fairly verbose about the HTTP requests it makes.

I’m open in principle to adding a tunable, but for that we need to understand what to tune.

Jan 13 '23 15:01 mtrmac

@mtrmac I've asked AWS Support if they can provide more details on what is specifically required. I'll let you know what they say.

Jan 16 '23 04:01 rittneje

They say we need to re-push the "complete" image. I guess that means all the layers?

Also, I noticed something odd. I just ran skopeo copy to copy from the ECR image to itself for testing purposes, and it logged several lines of the form Copying blob dd1a79fb6ea3 skipped: already exists. However, when our build job copies from Artifactory to ECR, it instead logs Copying blob sha256:dd1a79fb6ea3e89d51a4e210777fdc20a6a65c5deb9226774e6b1ac94367c67b. But based on the build time it definitely seems like it is skipping. Do you know what could be causing this discrepancy? Also why is one log using the short digest and the other the full digest?

Jan 16 '23 13:01 rittneje

Compare the --debug log.

Jan 16 '23 14:01 mtrmac

A friendly reminder that this issue had no activity for 30 days.

Feb 20 '23 00:02 github-actions[bot]

@mtrmac What if I need to push same blob but with different image or tag? It refuses: Copying blob 090d1abdb0e8 skipped: already exists

Jul 30 '23 07:07 MrFoxPro

@MrFoxPro How does that make a difference? The blob exists on the registry, which is the effect you need, isn’t it?

If you want to push to a different tag, push to a different tag. This reuse just means that push is faster and uses less CPU, disk and network bandwidth.

Or is this also related to the AWS scanning trigger?

Jul 31 '23 14:07 mtrmac

A friendly reminder that this issue had no activity for 30 days.

Aug 31 '23 00:08 github-actions[bot]

@rittneje , Did you manage to find a solution for your problem? We are running into the same issue when trying to enable AWS Inspector Enhanced scanning on older images.

Oct 11 '23 09:10 MarienL1995

Is it know what exactly needs to happen to trigger the AWS behavior?

I don’t know whether we would want to solve this by adding an option to the transport / CLI, or by building a separate (simple, slow) “upload all blobs” tool — but the first step needs to be understanding what makes the difference.

Oct 11 '23 13:10 mtrmac

@MarienL1995 We used the AWS CLI to delete the image (aws ecr batch-delete-image), and then skopeo to re-push it. That was enough to cause it to re-scan.

Oct 11 '23 19:10 rittneje

A friendly reminder that this issue had no activity for 30 days.

Nov 12 '23 00:11 github-actions[bot]