acr
acr copied to clipboard
ACR w/ anonymous pull sometimes fails with 401 using buildx push
Describe the bug When using an anonymous pull enabled ACR and using buildx with the output type of registry (to push images to ACR), sometimes the image push fails with a 401. This behavior was observed in cloud-provider-azure(#855).
This link show a failing example of cloud-provider-azure(#855): https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cloud-provider-azure/855/pull-cloud-provider-azure-e2e-ccm-capz/1450736711356125184/build-log.txt
To get around the issue in https://github.com/kubernetes-sigs/cloud-provider-azure/pull/855, the script needed to build locally with buildx, then use docker push to send the artifacts to ACR.
This link show a working example of cloud-provider-azure(#855): https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cloud-provider-azure/855/pull-cloud-provider-azure-e2e-ccm-capz/1451055064192913408/build-log.txt
To Reproduce Steps to reproduce the behavior:
- Use an ACR with anonymous pull
-
az login acr -n name-of-registry
-
docker buildx build --pull --output=type=registry --platform linux/amd64 --build-arg ENABLE_GIT_COMMAND="true" --build-arg ARCH="amd64" --build-arg VERSION="" --file cloud-node-manager.Dockerfile --tag capzci.azurecr.io/azure-cloud-node-manager-linux:8faf43ff-amd64 .
Expected behavior docker buildx build should succeed and container artifacts should be in ACR.
Any relevant environment information
- Docker version 20.10.9, build c2ea9bc
- capzci.azurecr.io UTC 10-20 09:00
Additional context Discussed with @northtyphoon, @feiskyer, @mainred, @CecileRobertMichon, @cpuguy83
This seems to happen specifically when using a containerized buildkit instance (docker buildx create --use && docker buildx build ...
) and it always happens in that case.
In my traces I'm seeing 401's when calling HEAD
on blob URL's as well as POST /blobs/uploads/
Azure disk CSI driver has been using docker buildx
and containerized buildkit instance long time ago, I still could not figured out why it has been always working with k8sprow.azurecr.io
ACR.
https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/3a9098c345e16d04c5c686c338e80c17c8c2a849/Makefile#L188
You aren't pushing with buildkit in that.
docker login -u ${AZURE_CLIENT_ID} -p ${AZURE_CLIENT_SECRET} ${REGISTRY}
before docker buildx build --push
can work as a workaround.
Thanks for the investigation from @northtyphoon, here is my copy and paste of his reply:
The above conditions will run into a corner case and hit a buildkit bug (https://github.com/moby/buildkit/blob/ffe2301031c8f8bfb8d5fc5034e5e509c5624913/session/auth/authprovider/authprovider.go#L91) az acr login saves a token credential in credential store. The client is supposed to follow oauth2 protocol (Oauth2 Token Authentication | Docker Documentation) to acquire the token but buildkit doesn't.
The above code has a hack logic to use whatever credential to do a basic auth first (token credential cannot be used in basic auth as it doesn't have user name) and expect the registry to return 401 and retry with oauth2.
However ACR has a fallback logic to return 200 with a default image pull access token for anonymous enabled registry. buildkit code treast it as a valid token and use it to push image and fails.
Closing as this has been inactive for over three months. Please reopen this issue if you would like additional guidance.
This has been fixed.