kaniko icon indicating copy to clipboard operation
kaniko copied to clipboard

Kaniko fails to pull from GCR in Gitlab

Open jgsuess opened this issue 1 year ago • 13 comments

Actual behavior

In gitlab, kaniko obtains the manifest of an image, but fails to obtain the image from GCR.

Expected behavior In gitlab, kaniko obtains the image from GCR.

To Reproduce Steps to reproduce the behavior:

  1. In a Gitlab repository, create the files below.
  2. Observe build failure
build:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:v1.23.2-debug
    entrypoint: [""]
  script:
     - /kaniko/executor
      --context "${CI_PROJECT_DIR}"
      --dockerfile "${CI_PROJECT_DIR}/Dockerfile"
      --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"
FROM gcr.io/distroless/java17-debian12:nonroot

Additional Information

  • Dockerfile Please provide either the Dockerfile you're trying to build or one that can reproduce this error.
  • Build Context Please provide or clearly describe any files needed to build the Dockerfile (ADD/COPY commands)
  • Kaniko Image (fully qualified with digest)
Using docker image sha256:16b383e1c3b259d59f75a2720a45ccf15b3a716cef44c6a5c521ceb471117168 for gcr.io/kaniko-project/executor:v1.23.2-debug with digest gcr.io/kaniko-project/executor@sha256:c3109d5926a997b100c4343944e06c6b30a6804b2f9abe0994d3de6ef92b028e ...
$ /kaniko/executor --context "${CI_PROJECT_DIR}" --dockerfile "${CI_PROJECT_DIR}/Dockerfile" --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"
INFO[0000] Using dockerignore file: /builds/australian-e-health-research-centre/digital-health-strengthening-standards-capability/infrastructure/containers/hapi-fhir-jpa-server/.dockerignore 
INFO[0000] Retrieving image manifest gcr.io/distroless/java17-debian12:nonroot 
INFO[0000] Retrieving image gcr.io/distroless/java17-debian12:nonroot from registry gcr.io 
error building image: unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fjava17-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
  • - [ ]
Please check if the build works in docker but not in kaniko
  • - [x]
Please check if this error is seen when you use --cache flag
  • - [ ]
Please check if your dockerfile is a multistage dockerfile
  • - []

jgsuess avatar Oct 02 '24 03:10 jgsuess

When running as a gitlab job, kaniko automatically sets authentication for you based on the predefined variables:

  • CI_REGISTRY
  • CI_REGISTRY_USER
  • CI_REGISTRY_PASSWORD

If you need different credentials either inside gitlab or with other registries, you must manually set these credentials.

ie:

build:
  variables:
    DOCKER_CONFIG_JSON: |
      {
          "auths":{
              "$MY_REGISTRY":{
                  "auth":"$MY_AUTH"
              }
          }
      }
  before_script:
    - echo $DOCKER_CONFIG_JSON > /kaniko/.docker/config.json

That's the downside of doing things implicitly, it works out of the box until it doesn't, and then everybody gets confused.

mzihlmann avatar Oct 15 '24 03:10 mzihlmann

It looks like the OP is trying to fetch gcr.io/distroless/java17-debian12:nonroot which doesn't need authentication.

This is a duplicate of https://github.com/GoogleContainerTools/kaniko/issues/1984

jameshartig avatar Oct 15 '24 18:10 jameshartig

Indeed! Sorry for not reading carefully enough. For me however, this only happens on gitlab.com with runners from gitlab.com (docker). on our self-hosted gitlab + self-hosted runners (k8s) this is not reproducible. I experimented a bit around and found this workaround:

build:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:v1.23.2-debug
    entrypoint: [""]
  variables:
    FF_NETWORK_PER_BUILD: true
  script:
    - /kaniko/executor
      --context "${CI_PROJECT_DIR}"
      --dockerfile "${CI_PROJECT_DIR}/Dockerfile"
      --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"

FF_NETWORK_PER_BUILD

but I have not yet understood why.

mzihlmann avatar Oct 16 '24 02:10 mzihlmann

As the other thread mentioned this works with 1.7.0 and fails starting from 1.8.0. good opportunity for some good old git bisect action. https://github.com/GoogleContainerTools/kaniko/compare/v1.7.0...v1.8.0

at first glance this is a bit suspicious https://github.com/GoogleContainerTools/kaniko/commit/09e70e44d9e9a3fecfcf70cb809a654445837631

when I short-circuit the resolve function it indeed no longer fails

func resolve(ctx context.Context) authn.Authenticator {
	return authn.Anonymous
}

mzihlmann avatar Oct 16 '24 03:10 mzihlmann

running git bisect lead me to this commit https://github.com/GoogleContainerTools/kaniko/commit/633f555c5c13fc1ef08f819cf60a93d19cd44081 currently looking at the changes in that commit, it was meant to fix https://github.com/GoogleContainerTools/kaniko/pull/1856

mzihlmann avatar Oct 17 '24 04:10 mzihlmann

Looks like when the registry is gcr.io or pkg.dev it ends up calling google.FindDefaultCredentialsWithParams which ends up talking to the GCE metadata server to pull credentials. I speculate this is happening and it's getting an unusable token while running on GitLab's runners.

We ended up fixing this by setting GOOGLE_APPLICATION_CREDENTIALS="/dev/null" which short-circuits that function and returns an error causing the google keychain to fallback to anonymous.

jameshartig avatar Oct 17 '24 19:10 jameshartig

sorry for not posting yesterday already. I was contacting gitlab to make sure it is ok for me to write this message publicly, but now that the cat is out of the bag I must say it's quite hilarious. kaniko does what it is supposed to do perfectly well in this case :smile:.

Luckily the access token it receives can't be used for much, ie. downloading images doesn't work :rofl:. I tried to write the issue report into their logging system but got stuck because I couldn't guess their logstream name :rofl:.

mzihlmann avatar Oct 17 '24 22:10 mzihlmann

I think we can still improve the situation out of the box for users. The fundamental problem is that if authentication works and a token is received it is of course used, but if the permissions on the token don't allow image pulling we simply give up. we never try to run without token as a fallback. Even better, when we first request the token we should already set the correct scope, then it should be denied and everything should work as expected.

mzihlmann avatar Oct 17 '24 23:10 mzihlmann

I agree the situation can be improved. Currently the list of credential sources is hardcoded. Could that be a flag instead so it could be customized if you don't need specific ones, like Google, for example?

jameshartig avatar Oct 18 '24 20:10 jameshartig

At the moment, what is the best recommended workaround for this? It should probably be added to the other bug and added to the documentation on gitlab while this continues. Using the public GCR would likely be a very common application case.

jgsuess avatar Oct 18 '24 22:10 jgsuess

Good morning,

I did not yet receive an explicit ok from gitlab, but so far they said they are not concerned, so I think you're right it's due time to inform the other channel. I think @jameshartig's solution is the neatest so far as adding the feature flag impacts a lot more than just this bug, at least it also takes way longer to start a runner on their side (hence no default).

I agree the situation can be improved. Currently the list of credential sources is hardcoded. Could that be a flag instead so it could be customized if you don't need specific ones, like Google, for example?

in fear of repeating myself:

That's the downside of doing things implicitly, it works out of the box until it doesn't, and then everybody gets confused.

But I think even though the change would be easy it would be very difficult to get it in as it breaks both the user interface and philosophy.

mzihlmann avatar Oct 18 '24 23:10 mzihlmann

I would agree that implicit is a bad idea. A good documentation that explains how this happens would be the preferred choice from my perspective. Otherwise, when you start trying to understand how it does something you end up with a question. So how does that even work? The behaviour becomes counterintuitive.

jgsuess avatar Oct 19 '24 00:10 jgsuess

I think the default could be what is hardcoded now to not make it more complicated for anyone who is fine with the current keychain. Something like --cred-sources with a default value of "google,ecr,acr,gitlab".

jameshartig avatar Oct 19 '24 01:10 jameshartig