kops icon indicating copy to clipboard operation
kops copied to clipboard

Add support for using ECR as pull-through image cache

Open rsafonseca opened this issue 1 year ago • 18 comments

This PR introduces a simple way to enable using ECR as a Pull-through image cache, without having to mutate images on the cluster using tools like Kyverno.

Containerd already has support for specifying registry mirrors in kops, but since ECR uses short lived tokens, it's not trivial (or even impossible without adding a few extra hacks on top of it) to use it as a pull-through cache.

This PR also bumps the ecr-credential-provider binary, which before version 1.29.0 specifically tried to parse an ECR repo URL from the image passed, leading to not being possible to enable this feature. This is now resolved in the latest versions.

This PR uses a flag to enable the feature when needed, and adds any server addresses configured in the mirrors to be allowed on the CredentialProviderConfig object that kops configures on the kubelet.

To configure this:

example:

spec:
  containerd:
    useECRCredentialsForMirrors: true
    registryMirrors:
      docker.io:
        - https://<MY-AWS-ACCOUNT>.dkr.ecr.us-east-1.amazonaws.com/v2/<MY cache rule namespace, e.g docker-hub>

rsafonseca avatar May 30 '24 12:05 rsafonseca

Hi @rsafonseca. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar May 30 '24 12:05 k8s-ci-robot

/ok-to-test

hakman avatar May 30 '24 16:05 hakman

Any idea why these e2e tests might be failing @hakman? It seems odd that they never come up and the log dump references not finding an unrelated role that should belong to another job e.g. pull-kops-e2e-cni-amazonvpc:

0530 19:12:47.346328   14816 dumplogs.go:51] /home/prow/go/src/k8s.io/kops/.build/dist/linux/amd64/kops toolbox dump --name e2e-pr16593.pull-kops-e2e-cni-amazonvpc.test-cncf-aws.k8s.io --dir /logs/artifacts --private-key /tmp/kops/e2e-pr16593.pull-kops-e2e-cni-amazonvpc.test-cncf-aws.k8s.io/id_ed25519 --ssh-user ubuntu
W0530 19:12:59.466784   50124 aws.go:2055] could not find role "masters.e2e-pr16593.pull-kops-e2e-cni-cilium-ipv6.test-cn-40qra0". Resource may already have been deleted: operation error IAM: GetInstanceProfile, https response error StatusCode: 404, RequestID: 013035b7-7eba-43a9-98dd-abf842145433, NoSuchEntity: Instance Profile masters.e2e-pr16593.pull-kops-e2e-cni-cilium-ipv6.test-cn-40qra0 cannot be found.
W0530 19:13:02.485316   50124 aws.go:2055] could not find role "nodes.e2e-pr16593.pull-kops-e2e-cni-cilium-ipv6.test-cncf-47o1f8". Resource may already have been deleted: operation error IAM: GetInstanceProfile, https response error StatusCode: 404, RequestID: 4bc7f2c4-9c38-4723-85bd-b493cf0794c8, NoSuchEntity: Instance Profile nodes.e2e-pr16593.pull-kops-e2e-cni-cilium-ipv6.test-cncf-47o1f8 cannot be found.

could something be leaking somewhere? or do you think there's any change here that could be somehow related to this weird behaviour?

EDIT: nvm, found the problem... hidden tabs in string literal.. damn vscode 😂 the above behaviour is still odd though, seems like the logs are leaking across jobs when they fail

rsafonseca avatar May 30 '24 20:05 rsafonseca

Please ignore the IPv6 tests

hakman avatar May 31 '24 08:05 hakman

All good then, though it looks like pull-kops-e2e-cni-cilium-eni is flaky, passed on retest. Any thoughts on this PR @hakman ? Does it make sense to you to add the config option under containerd on the CRD? :)

rsafonseca avatar May 31 '24 10:05 rsafonseca

Any update about this PR? Do you need any help or test, to be merged faster? I built the images locally from this repo and tested it on AWS. It's worked for me.

sczelo avatar Jun 21 '24 08:06 sczelo

Did you get a chance to review this yet @hakman @johngmyers ? :)

rsafonseca avatar Jun 21 '24 19:06 rsafonseca

@rifelpet @hakman @johngmyers is there any way we can help you guys to get it merged before/into the 1.30 release?

SzilardS avatar Jul 02 '24 12:07 SzilardS

This PR also bumps the ecr-credential-provider binary, which before version 1.29.0 specifically tried to parse an ECR repo URL from the image passed, leading to not being possible to enable this feature. This is now resolved in the latest versions.

@rsafonseca can you provide some more details about this? An issue or PR from the source repo?

elliotdobson avatar Sep 02 '24 23:09 elliotdobson

There is no specific issue regarding this, simply the failure logic was changed some commits ago to not error out when it fails to parse an ECR specific url. This is easy to test with the old vs newer binaries, the version that kops currently has in the assets will drop out and not pass the ecr auth token when the referenced image repo is not ECR, so when containerd tries to use ECR as a mirror it has no creds.

rsafonseca avatar Sep 03 '24 12:09 rsafonseca

Hey folks, can anyone take a look at this? :)

rsafonseca avatar Sep 26 '24 09:09 rsafonseca

/retest-required

rsafonseca avatar Oct 25 '24 13:10 rsafonseca

@hakman I've removed the for loop from the inline string literal as you had suggested in the office hours, and in fact i removed the whole string literal. Overall the number of lines increased, but it should be more readable and easier to manipulate in the future. https://github.com/kubernetes/kops/pull/16593/commits/2852bccdbee7291a8a5a2bc4e32f2a83a88a525b

rsafonseca avatar Nov 04 '24 18:11 rsafonseca

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 20 '25 02:04 k8s-triage-robot

/remove-lifecycle stale

itskingori avatar Apr 22 '25 07:04 itskingori

rebased again, the number of files that needed their hashes updated inside tests/integration increased dramatically since the last rebase so the bot just moved this from L to XXL 🫠

rsafonseca avatar Jul 10 '25 19:07 rsafonseca

The tests are broken, so please ignore for now. Will fix them and merge this.

hakman avatar Jul 28 '25 11:07 hakman

/lgtm

hakman avatar Jul 28 '25 11:07 hakman

/hold for tests to be fixed

hakman avatar Jul 28 '25 11:07 hakman

/lgtm

hakman avatar Jul 28 '25 12:07 hakman

/unhold /retest

hakman avatar Jul 29 '25 09:07 hakman

/test all

hakman avatar Jul 29 '25 09:07 hakman

/approve

hakman avatar Jul 29 '25 09:07 hakman

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hakman

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jul 29 '25 09:07 k8s-ci-robot

/retest

hakman avatar Jul 29 '25 10:07 hakman

@rsafonseca: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kops-e2e-aws-upgrade-k130-ko130-to-klatest-kolatest-many-addons 8b06aa4613fe4830b3706597bf0eac05664684c0 link false /test pull-kops-e2e-aws-upgrade-k130-ko130-to-klatest-kolatest-many-addons
pull-kops-kubernetes-e2e-ubuntu-gce-build 8b06aa4613fe4830b3706597bf0eac05664684c0 link false /test pull-kops-kubernetes-e2e-ubuntu-gce-build
pull-kops-e2e-cni-kuberouter 8b06aa4613fe4830b3706597bf0eac05664684c0 link false /test pull-kops-e2e-cni-kuberouter
pull-kops-e2e-cni-flannel a7839282130bb2e1cb5946c8abb0eb3aa8e32228 link true /test pull-kops-e2e-cni-flannel
pull-kops-e2e-gce-cni-kindnet a7839282130bb2e1cb5946c8abb0eb3aa8e32228 link true /test pull-kops-e2e-gce-cni-kindnet
pull-kops-e2e-k8s-aws-amazonvpc-u2404 a7839282130bb2e1cb5946c8abb0eb3aa8e32228 link true /test pull-kops-e2e-k8s-aws-amazonvpc-u2404
pull-kops-e2e-gce-cni-calico 1794614c19a9289b4bf30be4e854edee59ca5d25 link false /test pull-kops-e2e-gce-cni-calico

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

k8s-ci-robot avatar Jul 29 '25 10:07 k8s-ci-robot

/override pull-kops-e2e-cni-amazonvpc

hakman avatar Jul 29 '25 10:07 hakman

@hakman: Overrode contexts on behalf of hakman: pull-kops-e2e-cni-amazonvpc

In response to this:

/override pull-kops-e2e-cni-amazonvpc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jul 29 '25 10:07 k8s-ci-robot

/override pull-kops-e2e-k8s-gce-ipalias

hakman avatar Jul 29 '25 10:07 hakman

@hakman: Overrode contexts on behalf of hakman: pull-kops-e2e-k8s-gce-ipalias

In response to this:

/override pull-kops-e2e-k8s-gce-ipalias

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Jul 29 '25 10:07 k8s-ci-robot