lifecycle icon indicating copy to clipboard operation
lifecycle copied to clipboard

Workaround lack of delete API support on ECR

Open ipmb opened this issue 3 years ago • 8 comments

Description

AWS ECR does not support the delete image registry API used by pack. See https://github.com/aws/containers-roadmap/issues/1229

When publishing a cache image to an ECR registry, users get the following error:

Unable to delete previous cache image: DELETE https://xxx.dkr.ecr.us-east-1.amazonaws.com/v2/ecrrepository-ns9dbfvxadcv/manifests/sha256:a16e6b50e11222fe92b2e1adb357c351b352aed1f165254b56919dec23779a23: unexpected status code 404 Not Found: 404 page not found

It doesn't seem to cause any issues, but looks like a problem for end-users.

Proposed solution

Well ideally AWS would fix their implementation, but the issue has been open for a year with no movement. Perhaps pack could detect ECR repos (maybe by an URL regex 🤷) and skip this step or perform the deletion by other means.

Describe alternatives you've considered

I just ignore the log line and move on.

Additional context

  • [ ] This feature should be documented somewhere

ipmb avatar Jan 10 '22 18:01 ipmb

Thank you for raising this issue. @natalieparellano can you please move this to lifecycle?

aemengo avatar Jan 10 '22 18:01 aemengo

Adding this for discussion as we're not sure if we should just do a regex on ECR, or perhaps have this be a setting that the platform can pass in.

natalieparellano avatar Mar 09 '22 15:03 natalieparellano

Just brainstorming a few more alternatives:

  1. Log this at the debug level. Folks can find the info if they have questions but it won't be in your face on every build.

  2. Don't delete old cache images at all. Maybe deletion was too aggressive to begin with and we can let user/platforms manage image retention as they see fit? One could argue that keeping old cache images makes it possible to reproduce old builds (although there would never be a sane way to do this with volume caches).

ekcasey avatar Mar 17 '22 18:03 ekcasey

@ekcasey I like not deleting cache images at all. I wasn't sure anyone would go for that though 😄 We have code around here that essentially removes the log lines because of another registry that doesn't allow delete and it confuses users since there is no action to take.

jabrown85 avatar Mar 30 '22 15:03 jabrown85

Discussed during 3/30 sync - we think the best path forward is to not delete the cache image at all, leaving it up to the platform. We should gate this behavior on a platform api version, so the change is not unexpected. This would require a sub-team RFC.

natalieparellano avatar Mar 30 '22 16:03 natalieparellano

@natalieparellano have we done a sub-team RFC before? Are those in buildpacks/rfcs and just tagged a certain way? Happy to create the RFC, just not sure I've seen one.

jabrown85 avatar Mar 31 '22 15:03 jabrown85

@jabrown85 yep, sub team RFCs are just regular RFCs with the sub-team-rfc label. And the voting would be different (majority of implementation maintainers vs. core team).

natalieparellano avatar Mar 31 '22 17:03 natalieparellano

Opened RFC here.

jabrown85 avatar Mar 31 '22 19:03 jabrown85

I have updated the title of this issue to reflect the approved RFC

natalieparellano avatar Jan 05 '23 20:01 natalieparellano

FWIW, I just got this email from Amazon:

We are reaching out to tell you about a new method in an OCI API that Amazon ECR supports that affects your account. On February 10, 2023, ECR released support for the the "delete method" on the manifests API [1] which can be used to trigger the deletion of individual image manifests. Previously, tools that called this method on AWS ECR would receive a 404 response, whereas since February 10, 2023, these calls will succeed and trigger a deletion of the corresponding image manifest if it exists in the repository.

You have made one or more calls to the Delete Image Manifest OCI API on ECR during the week of January 22. To ensure that you experience no unexpected changes in behavior, this change does not affect your account yet, so that you can update your workflows if needed. On April 15, 2023, we will enable the change on your account, and calls to this method will start to succeed as expected. We recommend that you review and, if needed, update any existing workflows that will be affected by this change before April 15, 2023.

If you have any questions or concerns, including if you need extra time to prepare for this change or you want to be able to access the API sooner, please reach out to support [2].

[1] https://github.com/opencontainers/distribution-spec/blob/main/spec.md#content-management [2] https://aws.amazon.com/support

ipmb avatar Feb 15 '23 14:02 ipmb

The RFC was merged but it's not yet implemented :)

natalieparellano avatar Jun 28 '23 15:06 natalieparellano