Add support for using ECR as pull-through image cache
This PR introduces a simple way to enable using ECR as a Pull-through image cache, without having to mutate images on the cluster using tools like Kyverno.
Containerd already has support for specifying registry mirrors in kops, but since ECR uses short lived tokens, it's not trivial (or even impossible without adding a few extra hacks on top of it) to use it as a pull-through cache.
This PR also bumps the ecr-credential-provider binary, which before version 1.29.0 specifically tried to parse an ECR repo URL from the image passed, leading to not being possible to enable this feature. This is now resolved in the latest versions.
This PR uses a flag to enable the feature when needed, and adds any server addresses configured in the mirrors to be allowed on the CredentialProviderConfig object that kops configures on the kubelet.
To configure this:
- Create your pull through cache rule in Amazon ECR
- Set spec.containerd.useECRCredentialsForMirrors to true
- Configure your mirrors in spec.containerd.registryMirrors
example:
spec:
containerd:
useECRCredentialsForMirrors: true
registryMirrors:
docker.io:
- https://<MY-AWS-ACCOUNT>.dkr.ecr.us-east-1.amazonaws.com/v2/<MY cache rule namespace, e.g docker-hub>
Hi @rsafonseca. Thanks for your PR.
I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/ok-to-test
Any idea why these e2e tests might be failing @hakman? It seems odd that they never come up and the log dump references not finding an unrelated role that should belong to another job e.g. pull-kops-e2e-cni-amazonvpc:
0530 19:12:47.346328 14816 dumplogs.go:51] /home/prow/go/src/k8s.io/kops/.build/dist/linux/amd64/kops toolbox dump --name e2e-pr16593.pull-kops-e2e-cni-amazonvpc.test-cncf-aws.k8s.io --dir /logs/artifacts --private-key /tmp/kops/e2e-pr16593.pull-kops-e2e-cni-amazonvpc.test-cncf-aws.k8s.io/id_ed25519 --ssh-user ubuntu
W0530 19:12:59.466784 50124 aws.go:2055] could not find role "masters.e2e-pr16593.pull-kops-e2e-cni-cilium-ipv6.test-cn-40qra0". Resource may already have been deleted: operation error IAM: GetInstanceProfile, https response error StatusCode: 404, RequestID: 013035b7-7eba-43a9-98dd-abf842145433, NoSuchEntity: Instance Profile masters.e2e-pr16593.pull-kops-e2e-cni-cilium-ipv6.test-cn-40qra0 cannot be found.
W0530 19:13:02.485316 50124 aws.go:2055] could not find role "nodes.e2e-pr16593.pull-kops-e2e-cni-cilium-ipv6.test-cncf-47o1f8". Resource may already have been deleted: operation error IAM: GetInstanceProfile, https response error StatusCode: 404, RequestID: 4bc7f2c4-9c38-4723-85bd-b493cf0794c8, NoSuchEntity: Instance Profile nodes.e2e-pr16593.pull-kops-e2e-cni-cilium-ipv6.test-cncf-47o1f8 cannot be found.
could something be leaking somewhere? or do you think there's any change here that could be somehow related to this weird behaviour?
EDIT: nvm, found the problem... hidden tabs in string literal.. damn vscode 😂 the above behaviour is still odd though, seems like the logs are leaking across jobs when they fail
Please ignore the IPv6 tests
All good then, though it looks like pull-kops-e2e-cni-cilium-eni is flaky, passed on retest. Any thoughts on this PR @hakman ? Does it make sense to you to add the config option under containerd on the CRD? :)
Any update about this PR? Do you need any help or test, to be merged faster? I built the images locally from this repo and tested it on AWS. It's worked for me.
Did you get a chance to review this yet @hakman @johngmyers ? :)
@rifelpet @hakman @johngmyers is there any way we can help you guys to get it merged before/into the 1.30 release?
This PR also bumps the ecr-credential-provider binary, which before version 1.29.0 specifically tried to parse an ECR repo URL from the image passed, leading to not being possible to enable this feature. This is now resolved in the latest versions.
@rsafonseca can you provide some more details about this? An issue or PR from the source repo?
There is no specific issue regarding this, simply the failure logic was changed some commits ago to not error out when it fails to parse an ECR specific url. This is easy to test with the old vs newer binaries, the version that kops currently has in the assets will drop out and not pass the ecr auth token when the referenced image repo is not ECR, so when containerd tries to use ECR as a mirror it has no creds.
Hey folks, can anyone take a look at this? :)
/retest-required
@hakman I've removed the for loop from the inline string literal as you had suggested in the office hours, and in fact i removed the whole string literal. Overall the number of lines increased, but it should be more readable and easier to manipulate in the future. https://github.com/kubernetes/kops/pull/16593/commits/2852bccdbee7291a8a5a2bc4e32f2a83a88a525b
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
rebased again, the number of files that needed their hashes updated inside tests/integration increased dramatically since the last rebase so the bot just moved this from L to XXL 🫠
The tests are broken, so please ignore for now. Will fix them and merge this.
/lgtm
/hold for tests to be fixed
/lgtm
/unhold /retest
/test all
/approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: hakman
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [hakman]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/retest
@rsafonseca: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| pull-kops-e2e-aws-upgrade-k130-ko130-to-klatest-kolatest-many-addons | 8b06aa4613fe4830b3706597bf0eac05664684c0 | link | false | /test pull-kops-e2e-aws-upgrade-k130-ko130-to-klatest-kolatest-many-addons |
| pull-kops-kubernetes-e2e-ubuntu-gce-build | 8b06aa4613fe4830b3706597bf0eac05664684c0 | link | false | /test pull-kops-kubernetes-e2e-ubuntu-gce-build |
| pull-kops-e2e-cni-kuberouter | 8b06aa4613fe4830b3706597bf0eac05664684c0 | link | false | /test pull-kops-e2e-cni-kuberouter |
| pull-kops-e2e-cni-flannel | a7839282130bb2e1cb5946c8abb0eb3aa8e32228 | link | true | /test pull-kops-e2e-cni-flannel |
| pull-kops-e2e-gce-cni-kindnet | a7839282130bb2e1cb5946c8abb0eb3aa8e32228 | link | true | /test pull-kops-e2e-gce-cni-kindnet |
| pull-kops-e2e-k8s-aws-amazonvpc-u2404 | a7839282130bb2e1cb5946c8abb0eb3aa8e32228 | link | true | /test pull-kops-e2e-k8s-aws-amazonvpc-u2404 |
| pull-kops-e2e-gce-cni-calico | 1794614c19a9289b4bf30be4e854edee59ca5d25 | link | false | /test pull-kops-e2e-gce-cni-calico |
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.
/override pull-kops-e2e-cni-amazonvpc
@hakman: Overrode contexts on behalf of hakman: pull-kops-e2e-cni-amazonvpc
In response to this:
/override pull-kops-e2e-cni-amazonvpc
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
/override pull-kops-e2e-k8s-gce-ipalias
@hakman: Overrode contexts on behalf of hakman: pull-kops-e2e-k8s-gce-ipalias
In response to this:
/override pull-kops-e2e-k8s-gce-ipalias
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.