kube-image-keeper
kube-image-keeper copied to clipboard
Error when pulling image from private ECR
Hi, I have some deployment that use private ECR. I use service account for credential. I read that kuik already support it on version 1.5.0 and right now I use 1.7.1 but somehow I am still getting error.
➜ ~ k get cachedimages (trident-staging/default)
NAME CACHED RETAIN EXPIRES AT PODS COUNT AGE 1 13m
xxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com-crunchydata-crunchy-postgres-ubi8-15.5-0-tsl 1 3h9m
Error log from controller:
2024-03-26T07:30:07.285Z ERROR failed to cache image {"controller": "cachedimage", "controllerGroup": "kuik.enix.io", "controllerKind": "CachedImage", "CachedImage": {"name":"xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com-crunchydata-crunchy-postgres-ubi8-15.5-0-tsl"}, "namespace": "", "name": "xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com-crunchydata-crunchy-postgres-ubi8-15.5-0-tsl", "reconcileID": "d9432bcf-f245-4bdb-b942-8ffe8a2e872e", "sourceImage": "xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com/crunchydata/crunchy-postgres:ubi8-15.5-0-tsl", "error": "GET https://xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com/v2/crunchydata/crunchy-postgres/manifests/ubi8-15.5-0-tsl: unexpected status code 401 Unauthorized: Not Authorized\n", "errorCauses": [{"error": "GET https://xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com/v2/crunchydata/crunchy-postgres/manifests/ubi8-15.5-0-tsl: unexpected status code 401 Unauthorized: Not Authorized\n"}]}
2024-03-26T07:30:07.286Z ERROR Reconciler error {"controller": "cachedimage", "controllerGroup": "kuik.enix.io", "controllerKind": "CachedImage", "CachedImage": {"name":"xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com-crunchydata-crunchy-postgres-ubi8-15.5-0-tsl"}, "namespace": "", "name": "xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com-crunchydata-crunchy-postgres-ubi8-15.5-0-tsl", "reconcileID": "d9432bcf-f245-4bdb-b942-8ffe8a2e872e", "error": "GET https://xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com/v2/crunchydata/crunchy-postgres/manifests/ubi8-15.5-0-tsl: unexpected status code 401 Unauthorized: Not Authorized\n", "errorCauses": [{"error": "GET https://xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com/v2/crunchydata/crunchy-postgres/manifests/ubi8-15.5-0-tsl: unexpected status code 401 Unauthorized: Not Authorized\n"}]}
Do I need to add some credential somewhere?
Hi @riupie,
Where is deployed your k8s cluster : is it an EKS one ? Something else ? * If EKS, the credentials should be managed automatically without any manipulation on our side * If not, you should set the needed info in a specific pullSecret (standard methodology)
@Nicolasgouze I deployed on top EKS.
If EKS, the credentials should be managed automatically without any manipulation on our side
It's weird, so what make my image failed to pull image? Should I put service account on registry deployment? Any idea what should I check?
Hello,
Sorry but I couldn't reproduce this bug in my EKS setup. Does you EKS setup has anything particular? Can you pull this image without kuik? (you can use the value controllers.webhook.ignoredImages
to ignore this specific image).
Yes, I can pull the image without kuik since it already deployed from the start. I don't know if this will help you to reproduce or not but I use IRSA to grant the access.
- Create policy to access ECR
- Using aws module
terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc
to create iam role - Create service account and add annotation to attach role on point 2.
- Attach service account on each deployment that use the private ECR
I'm not totally sure if this is related, but maybe this could help : https://github.com/awslabs/amazon-ecr-credential-helper/issues/581
the credentials should be managed automatically
@Nicolasgouze how kuik authenticate and pull image from private repository? as I remember we didn't need to setup role/pull secret on kuik itself, only on related app deployment.
@riupie in v1.7.1, the caching mechanism is implemented in registry.go#L105-L122 which uses the registry.GetKeychains
function to retreive keychains based on the environment and CachedImage's pull secrets. Authentication against ECR in an EKS cluster is done using the AWS helper (authn.NewKeychainFromHelper(ecrLogin.NewECRHelper())
) in the registry.GetKeychains
function. Implementation of this helper can be found there: https://github.com/awslabs/amazon-ecr-credential-helper
I don't know if this will help you to reproduce or not but I use IRSA to grant the access.
- Create policy to access ECR
- Using aws module
terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc
to create iam role- Create service account and add annotation to attach role on point 2.
- Attach service account on each deployment that use the private ECR
Sorry, I revise my statement here. My EKS didn't use IRSA or pull secret at all. I use IAM role that attached to each EKS node group for ECR access. Is it already supported?
I think I found the culprit.
When we use amazon-ecr-credential-helper
, it will call ec2 metadata, right? In my case, I set HttpPutResponseHopLimit
to 1 on EC2, which mean metadata service only can be accessed on local EC2. Accessing metadata service from container means it need 2 hops, that will be rejected by AWS since I only set hoplimit 1.
It's just my hypothesis, I don't have any AWS account where I can customize hoplimit.
According to the AWS documentation, it is indeed recommended to set this value to 2 : https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-options.html
In a container environment, we recommend setting the hop limit to 2.
You should consider finding a way to set the hoplimit to 2.
I close the issue since there is nothing we can do on our side considering that the issue comes from a bad configuration in your AWS account.