acr icon indicating copy to clipboard operation
acr copied to clipboard

ACR w/ anonymous pull sometimes fails with 401 using buildx push

Open devigned opened this issue 2 years ago • 6 comments

Describe the bug When using an anonymous pull enabled ACR and using buildx with the output type of registry (to push images to ACR), sometimes the image push fails with a 401. This behavior was observed in cloud-provider-azure(#855).

This link show a failing example of cloud-provider-azure(#855): https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cloud-provider-azure/855/pull-cloud-provider-azure-e2e-ccm-capz/1450736711356125184/build-log.txt

To get around the issue in https://github.com/kubernetes-sigs/cloud-provider-azure/pull/855, the script needed to build locally with buildx, then use docker push to send the artifacts to ACR.

This link show a working example of cloud-provider-azure(#855): https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cloud-provider-azure/855/pull-cloud-provider-azure-e2e-ccm-capz/1451055064192913408/build-log.txt

To Reproduce Steps to reproduce the behavior:

  1. Use an ACR with anonymous pull
  2. az login acr -n name-of-registry
  3. docker buildx build --pull --output=type=registry --platform linux/amd64 --build-arg ENABLE_GIT_COMMAND="true" --build-arg ARCH="amd64" --build-arg VERSION="" --file cloud-node-manager.Dockerfile --tag capzci.azurecr.io/azure-cloud-node-manager-linux:8faf43ff-amd64 .

Expected behavior docker buildx build should succeed and container artifacts should be in ACR.

Any relevant environment information

  • Docker version 20.10.9, build c2ea9bc
  • capzci.azurecr.io UTC 10-20 09:00

Additional context Discussed with @northtyphoon, @feiskyer, @mainred, @CecileRobertMichon, @cpuguy83

devigned avatar Oct 21 '21 17:10 devigned

This seems to happen specifically when using a containerized buildkit instance (docker buildx create --use && docker buildx build ...) and it always happens in that case.

cpuguy83 avatar Oct 21 '21 21:10 cpuguy83

In my traces I'm seeing 401's when calling HEAD on blob URL's as well as POST /blobs/uploads/

cpuguy83 avatar Oct 21 '21 22:10 cpuguy83

Azure disk CSI driver has been using docker buildx and containerized buildkit instance long time ago, I still could not figured out why it has been always working with k8sprow.azurecr.io ACR.

https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/3a9098c345e16d04c5c686c338e80c17c8c2a849/Makefile#L188

andyzhangx avatar Oct 22 '21 01:10 andyzhangx

You aren't pushing with buildkit in that.

cpuguy83 avatar Oct 22 '21 15:10 cpuguy83

docker login -u ${AZURE_CLIENT_ID} -p ${AZURE_CLIENT_SECRET} ${REGISTRY} before docker buildx build --push can work as a workaround.

mainred avatar Jan 11 '22 04:01 mainred

Thanks for the investigation from @northtyphoon, here is my copy and paste of his reply:

The above conditions will run into a corner case and hit a buildkit bug (https://github.com/moby/buildkit/blob/ffe2301031c8f8bfb8d5fc5034e5e509c5624913/session/auth/authprovider/authprovider.go#L91) az acr login saves a token credential in credential store. The client is supposed to follow oauth2 protocol (Oauth2 Token Authentication | Docker Documentation) to acquire the token but buildkit doesn't.

The above code has a hack logic to use whatever credential to do a basic auth first (token credential cannot be used in basic auth as it doesn't have user name) and expect the registry to return 401 and retry with oauth2.

However ACR has a fallback logic to return 200 with a default image pull access token for anonymous enabled registry. buildkit code treast it as a valid token and use it to push image and fails.

mainred avatar Jan 11 '22 06:01 mainred

Closing as this has been inactive for over three months. Please reopen this issue if you would like additional guidance.

This has been fixed.

terencet-dev avatar Nov 17 '22 21:11 terencet-dev