Kaniko fails to pull from GCR in Gitlab
Actual behavior
In gitlab, kaniko obtains the manifest of an image, but fails to obtain the image from GCR.
Expected behavior In gitlab, kaniko obtains the image from GCR.
To Reproduce Steps to reproduce the behavior:
- In a Gitlab repository, create the files below.
- Observe build failure
build:
stage: build
image:
name: gcr.io/kaniko-project/executor:v1.23.2-debug
entrypoint: [""]
script:
- /kaniko/executor
--context "${CI_PROJECT_DIR}"
--dockerfile "${CI_PROJECT_DIR}/Dockerfile"
--destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"
FROM gcr.io/distroless/java17-debian12:nonroot
Additional Information
- Dockerfile Please provide either the Dockerfile you're trying to build or one that can reproduce this error.
- Build Context Please provide or clearly describe any files needed to build the Dockerfile (ADD/COPY commands)
- Kaniko Image (fully qualified with digest)
Using docker image sha256:16b383e1c3b259d59f75a2720a45ccf15b3a716cef44c6a5c521ceb471117168 for gcr.io/kaniko-project/executor:v1.23.2-debug with digest gcr.io/kaniko-project/executor@sha256:c3109d5926a997b100c4343944e06c6b30a6804b2f9abe0994d3de6ef92b028e ...
$ /kaniko/executor --context "${CI_PROJECT_DIR}" --dockerfile "${CI_PROJECT_DIR}/Dockerfile" --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"
INFO[0000] Using dockerignore file: /builds/australian-e-health-research-centre/digital-health-strengthening-standards-capability/infrastructure/containers/hapi-fhir-jpa-server/.dockerignore
INFO[0000] Retrieving image manifest gcr.io/distroless/java17-debian12:nonroot
INFO[0000] Retrieving image gcr.io/distroless/java17-debian12:nonroot from registry gcr.io
error building image: unable to complete operation after 0 attempts, last error: GET https://gcr.io/v2/token?scope=repository%3Adistroless%2Fjava17-debian12%3Apull&service=gcr.io: UNAUTHORIZED: authentication failed
Triage Notes for the Maintainers
| Description | Yes/No |
|---|---|
| Please check if this a new feature you are proposing |
|
| Please check if the build works in docker but not in kaniko |
|
Please check if this error is seen when you use --cache flag |
|
| Please check if your dockerfile is a multistage dockerfile |
|
When running as a gitlab job, kaniko automatically sets authentication for you based on the predefined variables:
CI_REGISTRYCI_REGISTRY_USERCI_REGISTRY_PASSWORD
If you need different credentials either inside gitlab or with other registries, you must manually set these credentials.
ie:
build:
variables:
DOCKER_CONFIG_JSON: |
{
"auths":{
"$MY_REGISTRY":{
"auth":"$MY_AUTH"
}
}
}
before_script:
- echo $DOCKER_CONFIG_JSON > /kaniko/.docker/config.json
That's the downside of doing things implicitly, it works out of the box until it doesn't, and then everybody gets confused.
It looks like the OP is trying to fetch gcr.io/distroless/java17-debian12:nonroot which doesn't need authentication.
This is a duplicate of https://github.com/GoogleContainerTools/kaniko/issues/1984
Indeed! Sorry for not reading carefully enough. For me however, this only happens on gitlab.com with runners from gitlab.com (docker). on our self-hosted gitlab + self-hosted runners (k8s) this is not reproducible. I experimented a bit around and found this workaround:
build:
stage: build
image:
name: gcr.io/kaniko-project/executor:v1.23.2-debug
entrypoint: [""]
variables:
FF_NETWORK_PER_BUILD: true
script:
- /kaniko/executor
--context "${CI_PROJECT_DIR}"
--dockerfile "${CI_PROJECT_DIR}/Dockerfile"
--destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"
but I have not yet understood why.
As the other thread mentioned this works with 1.7.0 and fails starting from 1.8.0. good opportunity for some good old git bisect action. https://github.com/GoogleContainerTools/kaniko/compare/v1.7.0...v1.8.0
at first glance this is a bit suspicious https://github.com/GoogleContainerTools/kaniko/commit/09e70e44d9e9a3fecfcf70cb809a654445837631
when I short-circuit the resolve function it indeed no longer fails
func resolve(ctx context.Context) authn.Authenticator {
return authn.Anonymous
}
running git bisect lead me to this commit https://github.com/GoogleContainerTools/kaniko/commit/633f555c5c13fc1ef08f819cf60a93d19cd44081 currently looking at the changes in that commit, it was meant to fix https://github.com/GoogleContainerTools/kaniko/pull/1856
Looks like when the registry is gcr.io or pkg.dev it ends up calling google.FindDefaultCredentialsWithParams which ends up talking to the GCE metadata server to pull credentials. I speculate this is happening and it's getting an unusable token while running on GitLab's runners.
We ended up fixing this by setting GOOGLE_APPLICATION_CREDENTIALS="/dev/null" which short-circuits that function and returns an error causing the google keychain to fallback to anonymous.
sorry for not posting yesterday already. I was contacting gitlab to make sure it is ok for me to write this message publicly, but now that the cat is out of the bag I must say it's quite hilarious. kaniko does what it is supposed to do perfectly well in this case :smile:.
Luckily the access token it receives can't be used for much, ie. downloading images doesn't work :rofl:. I tried to write the issue report into their logging system but got stuck because I couldn't guess their logstream name :rofl:.
I think we can still improve the situation out of the box for users. The fundamental problem is that if authentication works and a token is received it is of course used, but if the permissions on the token don't allow image pulling we simply give up. we never try to run without token as a fallback. Even better, when we first request the token we should already set the correct scope, then it should be denied and everything should work as expected.
I agree the situation can be improved. Currently the list of credential sources is hardcoded. Could that be a flag instead so it could be customized if you don't need specific ones, like Google, for example?
At the moment, what is the best recommended workaround for this? It should probably be added to the other bug and added to the documentation on gitlab while this continues. Using the public GCR would likely be a very common application case.
Good morning,
I did not yet receive an explicit ok from gitlab, but so far they said they are not concerned, so I think you're right it's due time to inform the other channel. I think @jameshartig's solution is the neatest so far as adding the feature flag impacts a lot more than just this bug, at least it also takes way longer to start a runner on their side (hence no default).
I agree the situation can be improved. Currently the list of credential sources is hardcoded. Could that be a flag instead so it could be customized if you don't need specific ones, like Google, for example?
in fear of repeating myself:
That's the downside of doing things implicitly, it works out of the box until it doesn't, and then everybody gets confused.
But I think even though the change would be easy it would be very difficult to get it in as it breaks both the user interface and philosophy.
I would agree that implicit is a bad idea. A good documentation that explains how this happens would be the preferred choice from my perspective. Otherwise, when you start trying to understand how it does something you end up with a question. So how does that even work? The behaviour becomes counterintuitive.
I think the default could be what is hardcoded now to not make it more complicated for anyone who is fine with the current keychain. Something like --cred-sources with a default value of "google,ecr,acr,gitlab".