kpack icon indicating copy to clipboard operation
kpack copied to clipboard

Builder fails on ECR when using dockerconfigjson file

Open wdonne opened this issue 1 year ago • 6 comments

Hello,

With ECR you can use AWS as the username and an authentication token as the password. You can put this in a dockerconfigjson file like this:

{
    ".dockerconfigjson": {
      "auths": {
        "https://<account>.dkr.ecr.<region>.amazonaws.com/v2/<repository>": {
          "username": "AWS",
          "password": "XXXXX ECR Authorization Token XXXXX"
        }
     }
  }
}

If you put that in a Kubernetes secret of type kubernetes.io/dockerconfigjson and attach it to the kpack service account as both a secret and an image pull secret, then the Builder that uses that service account will produce the following error:

status:
  conditions:
    - lastTransitionTime: '2023-05-26T11:53:54Z'
      message: >-
        Post
        "https://<account>.dkr.ecr.<region>.amazonaws.com/v2/<repository>/blobs/uploads/":
        EOF
      status: 'False'
      type: Ready

The logs in the kpack controller show this:

{
  "level":"error",
  "ts":"2023-05-26T11:54:00.413273068Z",
  "logger":"controller",
  "caller":"controller/controller.go:566",
  "msg":"Reconcile error",
  "commit":"79126fe-dirty",
  "knative.dev/kind":"builders.kpack.io",
  "knative.dev/traceid":"f787f9bb-f774-4dd3-a65e-8e00b519d2f3",
  "knative.dev/key":"play/weblogic-ai-builder",
  "duration":5.782489848,
  "error":"Post \"https://<account>.dkr.ecr.<region>.amazonaws.com/v2/<repository>/blobs/uploads/\": EOF",
  "stacktrace":"knative.dev/pkg/controller.(*Impl).handleErr\n\tknative.dev/[email protected]/controller/controller.go:566\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tknative.dev/[email protected]/controller/controller.go:543\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\tknative.dev/[email protected]/controller/controller.go:491"
}

The AWS policy in the role I generated the authorization token from was the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "ecr:BatchGetImage",
                "ecr:DescribeRepositories",
                "ecr:ListTagsForResource",
                "ecr:PutImage",
                "ecr:UploadLayerPart",
                "ecr:CompleteLayerUpload",
                "ecr:InitiateLayerUpload",
                "ecr:BatchCheckLayerAvailability",
                "ecr:GetDownloadUrlForLayer"
            ],
            "Resource": "arn:aws:ecr:<region>:<account>:repository/<repository>"
        },
        {
            "Sid": "",
            "Effect": "Allow",
            "Action": [
                "ecr:GetAuthorizationToken"
            ],
            "Resource": "*"
        }
    ]
}

wdonne avatar May 26 '23 12:05 wdonne

Note that for the cluster stack and the cluster store this works fine.

I have also tried an ECR policy with ecr:* as the action, meaning it can do anything with the repository, but that doesn't change anything.

wdonne avatar May 26 '23 14:05 wdonne

I forgot to mention that this is with version 0.10.1. I noticed that this release file uses the 0.10.1-rc.3 version of the images for the Deployment resources.

wdonne avatar May 31 '23 07:05 wdonne

Huh, these all look correct to me. For sanity's sake, can you check that:

  1. Is the ClusterStack and ClusterStore pointing to private ECR images or public images? If they're public images then unfortunately it doesn't tell us much about the ECR creds
  2. The ECR repository the builder is pointing to exists. ECR has an annoying policy of requiring repos to be explicitly created instead of dynamically created on pushes like dockerhub or gcr
  3. The Builder is created after the service account and secret. I'm not 100% sure but I think the controller doesn't re-reconcile Builders on service account changes

Can you also try using just the hostname in the dockerconfig? Something like:

      "auths": {
        "<account>.dkr.ecr.<region>.amazonaws.com": {
          "username": "AWS",
          "password": "XXXXX ECR Authorization Token XXXXX"
        }
     }

chenbh avatar Jul 05 '23 21:07 chenbh

Hey @wdonne We are facing similar issues with ECR put permissions too. Did you find any workaround?

semmet95 avatar Jul 19 '23 10:07 semmet95

Hi @semmet95 ,

I haven't pursued this further yet, but the only possible thing I see is using the domain name instead of the URL in the dockerconfig.

wdonne avatar Jul 19 '23 13:07 wdonne

@wdonne For me your approach worked when I created a secret using .docker/config json file after logging in to ecr with the IAM role with proper policies.

kubectl create secret generic regcred --from-file=.dockerconfigjson=/Users/amisingh/temp/.docker/config.json --type=kubernetes.io/dockerconfigjson

semmet95 avatar Jul 21 '23 05:07 semmet95