kube-image-keeper Error when pulling image from private ECR

Hi, I have some deployment that use private ECR. I use service account for credential. I read that kuik already support it on version 1.5.0 and right now I use 1.7.1 but somehow I am still getting error.

➜  ~ k get cachedimages                                                                                                                                                                                                              (trident-staging/default)
NAME                                                                                                              CACHED   RETAIN   EXPIRES AT             PODS COUNT   AGE                                                                    1            13m
xxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com-crunchydata-crunchy-postgres-ubi8-15.5-0-tsl                                                             1            3h9m

Error log from controller:

2024-03-26T07:30:07.285Z        ERROR   failed to cache image   {"controller": "cachedimage", "controllerGroup": "kuik.enix.io", "controllerKind": "CachedImage", "CachedImage": {"name":"xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com-crunchydata-crunchy-postgres-ubi8-15.5-0-tsl"}, "namespace": "", "name": "xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com-crunchydata-crunchy-postgres-ubi8-15.5-0-tsl", "reconcileID": "d9432bcf-f245-4bdb-b942-8ffe8a2e872e", "sourceImage": "xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com/crunchydata/crunchy-postgres:ubi8-15.5-0-tsl", "error": "GET https://xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com/v2/crunchydata/crunchy-postgres/manifests/ubi8-15.5-0-tsl: unexpected status code 401 Unauthorized: Not Authorized\n", "errorCauses": [{"error": "GET https://xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com/v2/crunchydata/crunchy-postgres/manifests/ubi8-15.5-0-tsl: unexpected status code 401 Unauthorized: Not Authorized\n"}]}
2024-03-26T07:30:07.286Z        ERROR   Reconciler error        {"controller": "cachedimage", "controllerGroup": "kuik.enix.io", "controllerKind": "CachedImage", "CachedImage": {"name":"xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com-crunchydata-crunchy-postgres-ubi8-15.5-0-tsl"}, "namespace": "", "name": "xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com-crunchydata-crunchy-postgres-ubi8-15.5-0-tsl", "reconcileID": "d9432bcf-f245-4bdb-b942-8ffe8a2e872e", "error": "GET https://xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com/v2/crunchydata/crunchy-postgres/manifests/ubi8-15.5-0-tsl: unexpected status code 401 Unauthorized: Not Authorized\n", "errorCauses": [{"error": "GET https://xxxxxxxxxx.dkr.ecr.ap-southeast-3.amazonaws.com/v2/crunchydata/crunchy-postgres/manifests/ubi8-15.5-0-tsl: unexpected status code 401 Unauthorized: Not Authorized\n"}]}

Do I need to add some credential somewhere?

Mar 26 '24 07:03 riupie

Hi @riupie,

Where is deployed your k8s cluster : is it an EKS one ? Something else ? * If EKS, the credentials should be managed automatically without any manipulation on our side * If not, you should set the needed info in a specific pullSecret (standard methodology)

Mar 29 '24 13:03 Nicolasgouze

@Nicolasgouze I deployed on top EKS.

If EKS, the credentials should be managed automatically without any manipulation on our side

It's weird, so what make my image failed to pull image? Should I put service account on registry deployment? Any idea what should I check?

Apr 02 '24 13:04 riupie

Hello,

Sorry but I couldn't reproduce this bug in my EKS setup. Does you EKS setup has anything particular? Can you pull this image without kuik? (you can use the value controllers.webhook.ignoredImages to ignore this specific image).

Apr 19 '24 14:04 plaffitt

Yes, I can pull the image without kuik since it already deployed from the start. I don't know if this will help you to reproduce or not but I use IRSA to grant the access.

Create policy to access ECR
Using aws module terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc to create iam role
Create service account and add annotation to attach role on point 2.
Attach service account on each deployment that use the private ECR

Apr 23 '24 12:04 riupie

I'm not totally sure if this is related, but maybe this could help : https://github.com/awslabs/amazon-ecr-credential-helper/issues/581

May 07 '24 09:05 plaffitt

the credentials should be managed automatically

@Nicolasgouze how kuik authenticate and pull image from private repository? as I remember we didn't need to setup role/pull secret on kuik itself, only on related app deployment.

May 17 '24 04:05 riupie

@riupie in v1.7.1, the caching mechanism is implemented in registry.go#L105-L122 which uses the registry.GetKeychains function to retreive keychains based on the environment and CachedImage's pull secrets. Authentication against ECR in an EKS cluster is done using the AWS helper (authn.NewKeychainFromHelper(ecrLogin.NewECRHelper())) in the registry.GetKeychains function. Implementation of this helper can be found there: https://github.com/awslabs/amazon-ecr-credential-helper

May 17 '24 09:05 plaffitt

I don't know if this will help you to reproduce or not but I use IRSA to grant the access.

Create policy to access ECR

Using aws module terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc to create iam role

Create service account and add annotation to attach role on point 2.

Attach service account on each deployment that use the private ECR

Sorry, I revise my statement here. My EKS didn't use IRSA or pull secret at all. I use IAM role that attached to each EKS node group for ECR access. Is it already supported?

May 17 '24 11:05 riupie

I think I found the culprit. When we use amazon-ecr-credential-helper, it will call ec2 metadata, right? In my case, I set HttpPutResponseHopLimit to 1 on EC2, which mean metadata service only can be accessed on local EC2. Accessing metadata service from container means it need 2 hops, that will be rejected by AWS since I only set hoplimit 1. It's just my hypothesis, I don't have any AWS account where I can customize hoplimit.

May 17 '24 12:05 riupie

According to the AWS documentation, it is indeed recommended to set this value to 2 : https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-options.html

In a container environment, we recommend setting the hop limit to 2.

You should consider finding a way to set the hoplimit to 2.

I close the issue since there is nothing we can do on our side considering that the issue comes from a bad configuration in your AWS account.

May 17 '24 15:05 plaffitt

kube-image-keeper kube-image-keeper copied to clipboard

Error when pulling image from private ECR

kube-image-keeper
kube-image-keeper copied to clipboard