Bicep for Visual Studio cannot load modules from ACR after upgrading to Azure CLI 2.61
Bicep version az bicep version 0.28.1 Bicep for Visual Studio version 0.28.1.47646 azure CLI 2.61
Describe the bug Visual Studio fails to load modules from ACR based registries. It gets the exception:
Severity Code Description Project File Line Suppression State
Error (active) BCP192 Unable to restore the artifact with reference "br:<myacr>.azurecr.io/bicep/modules/my-module:1.0.0": Unhandled exception: Azure.Identity.AuthenticationFailedException: The ChainedTokenCredential failed due to an unhandled exception: Azure CLI authentication timed out.
---> Azure.Identity.AuthenticationFailedException: Azure CLI authentication timed out.
at Azure.Identity.AzureCliCredential.RequestCliAccessTokenAsync(Boolean async, TokenRequestContext context, CancellationToken cancellationToken)
at Azure.Identity.AzureCliCredential.GetTokenImplAsync(Boolean async, TokenRequestContext requestContext, CancellationToken cancellationToken)
at Azure.Identity.CredentialDiagnosticScope.FailWrapAndThrow(Exception ex, String additionalMessage, Boolean isCredentialUnavailable)
at Azure.Identity.AzureCliCredential.GetTokenImplAsync(Boolean async, TokenRequestContext requestContext, CancellationToken cancellationToken)
at Azure.Identity.AzureCliCredential.GetTokenAsync(TokenRequestContext requestContext, CancellationToken cancellationToken)
at Azure.Identity.ChainedTokenCredential.GetTokenImplAsync(Boolean async, TokenRequestContext requestContext, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at Azure.Identity.ChainedTokenCredential.GetTokenImplAsync(Boolean async, TokenRequestContext requestContext, CancellationToken cancellationToken)
at Azure.Identity.ChainedTokenCredential.GetTokenAsync(TokenRequestContext requestContext, CancellationToken cancellationToken)
at Azure.Containers.ContainerRegistry.ContainerRegistryRefreshTokenCache.GetRefreshTokenFromCredentialAsync(TokenRequestContext context, String service, Boolean async, CancellationToken cancellationToken)
at Azure.Containers.ContainerRegistry.ContainerRegistryRefreshTokenCache.GetAcrRefreshTokenAsync(HttpMessage message, TokenRequestContext context, String service, Boolean async)
at Azure.Containers.ContainerRegistry.ContainerRegistryRefreshTokenCache.GetAcrRefreshTokenAsync(HttpMessage message, TokenRequestContext context, String service, Boolean async)
at Azure.Containers.ContainerRegistry.ContainerRegistryChallengeAuthenticationPolicy.AuthorizeRequestOnChallengeAsyncInternal(HttpMessage message, Boolean async)
at Azure.Core.Pipeline.BearerTokenAuthenticationPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
at Azure.Core.Pipeline.RedirectPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
at Azure.Containers.ContainerRegistry.ContainerRegistryRestClient.GetManifestAsync(String name, String reference, String accept, CancellationToken cancellationToken)
at Azure.Containers.ContainerRegistry.ContainerRegistryContentClient.GetManifestInternalAsync(String reference, Boolean async, CancellationToken cancellationToken)
at Azure.Containers.ContainerRegistry.ContainerRegistryContentClient.GetManifestAsync(String tagOrDigest, CancellationToken cancellationToken)
at Bicep.Core.Registry.AzureContainerRegistryManager.DownloadManifestAndLayersAsync(IOciArtifactReference artifactReference, ContainerRegistryContentClient client) in C:\__w\1\s\bicep\src\Bicep.Core\Registry\AzureContainerRegistryManager.cs:line 136
at Bicep.Core.Registry.AzureContainerRegistryManager.<>c__DisplayClass4_0.<<PullArtifactAsync>g__DownloadManifestInternalAsync|0>d.MoveNext() in C:\__w\1\s\bicep\src\Bicep.Core\Registry\AzureContainerRegistryManager.cs:line 43
--- End of stack trace from previous location ---
at Bicep.Core.Registry.AzureContainerRegistryManager.PullArtifactAsync(RootConfiguration configuration, IOciArtifactReference artifactReference) in C:\__w\1\s\bicep\src\Bicep.Core\Registry\AzureContainerRegistryManager.cs:line 50
at Bicep.Core.Registry.OciArtifactRegistry.TryRestoreArtifactAsync(RootConfiguration configuration, OciArtifactReference reference) in C:\__w\1\s\bicep\src\Bicep.Core\Registry\OciArtifactRegistry.cs:line 439
I suspect that this is due to the new (horrible) az login experience Azure CLI 2.61 implemented which requires UI based authentication.
To Reproduce Steps to reproduce the behavior:
- Ensure that you have Azure CLI 2.61 installed
- Have an ACR with a module (Ensure that local cache doesn't contains the module already)
- Reference that module in your bicep file
- Open Visual Studio and your bicep file
After a short period of time, the error will occur.
Additional context
Running az bicep restore -file <file>.bicep works.
We are still investigating the root cause. Workaround is to use command line for building.
Is there any update on this?
@CloudCees Sorry, it's been hard getting time for it. Is this affecting you badly?
@CloudCees Sorry, it's been hard getting time for it. Is this affecting you badly?
We experience this occasionally, and it happened this time as well. After a few tries with az login and restarting VS Code, it magically disappeared.
@CloudCees Please note, this bug is referring to Visual Studio, not Visual Studio Code. If you're seeing problems in Visual Studio Code, please enter a new bug with repro steps. Thanks!
@dozer75 Is this still affecting you? Thanks.
@StephenWeatherford
Yes it still does, however it seems to work partially... I have 4 modules in my bicep, but 2 of them fails while 2 works...
To reproduce this now I deleted the .bicep/br folder before I started Visual Studio. Otherwise I followed the steps described.
This is quite confusing. Bicep uses AzureCliCredential, which makes a call to "az account get-access-token --output json --resource https://management.azure.com", which in turn calls into python via az.cmd: ..\python.exe -IBm azure.cli az account get-access-token --output json --resource https://management.azure.com
python seems to take an inordinate amount of time starting up (but not when I run this from command line, only from within VS).
From procmon:
process start at 2:55:10
...
And then nothing interesting until 2:56:24:
During the 1.5 min wait (it varies, sometimes it's much faster, sometimes even slower), there is no logging info from az, so I don't think the python code has even started running.
Here's the start of the logging I get from az (starting at 14:56:25): 25472 : 2025-04-09 14:56:25,234 : DEBUG : cli.knack.log : File logging enabled - writing logs to 'C:\Users<user>.Azure\logs'. 25472 : 2025-04-09 14:56:25,234 : DEBUG : cli.knack.cli : Command arguments: ['account', 'get-access-token', '--output', 'json', '--resource', 'https://management.azure.com'] 25472 : 2025-04-09 14:56:25,234 : DEBUG : cli.knack.cli : init debug log: Cannot enable color. 25472 : 2025-04-09 14:56:25,234 : DEBUG : cli.knack.cli : Event: Cli.PreExecute [] 25472 : 2025-04-09 14:56:25,235 : DEBUG : cli.knack.cli : Event: CommandParser.OnGlobalArgumentsCreate [<function CLILogging.on_global_arguments at 0x0000016957CB3240>, <function OutputProducer.on_global_arguments at 0x0000016957E4C0E0>, <function CLIQuery.on_global_arguments at 0x0000016957E754E0>] 25472 : 2025-04-09 14:56:25,238 : DEBUG : cli.knack.cli : Event: CommandInvoker.OnPreCommandTableCreate [] 25472 : 2025-04-09 14:56:25,240 : DEBUG : cli.azure.cli.core : Modules found from index for 'account': ['azure.cli.command_modules.profile', 'azure.cli.command_modules.resource'] ...
So I don't think it's anything that az is doing (like waiting for user input).
Currently AzureCliCredential has a timeout on waiting for az's get-access-token of 18 seconds. I can increase that to, say, 2 min, but it doesn't seem even that's guaranteed.
From both "az bicep restore" CLI and vscode, the delay from starting python and the code running is only 1-2 seconds.
ntoskrnl.exe!KeSynchronizeExecution+0x8136
ntoskrnl.exe!KeLowerIrql+0x1a05
ntoskrnl.exe!KeLowerIrql+0x2e27
ntoskrnl.exe!KeWaitForSingleObject+0x256
ntoskrnl.exe!ExReleasePushLockSharedEx+0x9d
ntoskrnl.exe!KeUserModeCallback+0x35c
ntoskrnl.exe!CmCallbackGetKeyObjectID+0x9c30
ntoskrnl.exe!setjmpex+0x9115
ntdll.dll!ZwQueryInformationFile+0x14
KERNELBASE.dll!SetFilePointerEx+0xbb
ucrtbase.dll!fputc+0x2ad
ucrtbase.dll!fseeki64+0xee
ucrtbase.dll!lseeki64+0x28
python312.dll!PyThread_allocate_lock+0x474
python312.dll!PyThread_allocate_lock+0x15f
python312.dll!PyUnicode_InsertThousandsGrouping+0x7a8
python312.dll!PyObject_VectorcallMethod+0xab
python312.dll!PyThread_allocate_lock+0x87c
python312.dll!PyThread_allocate_lock+0x81d
python312.dll!PyThread_allocate_lock+0x682
python312.dll!PyThread_allocate_lock+0x5ab
python312.dll!PyObject_Vectorcall+0x427
python312.dll!PyObject_CallFunction_SizeT+0xdf
python312.dll!PyObject_CallFunction_SizeT+0x48
python312.dll!PyObject_CallMethod+0x58a
python312.dll!PyOS_FSPath+0x1d6
python312.dll!PySet_Add+0x60e
python312.dll!PyObject_Vectorcall+0xcf
python312.dll!PyObject_CallFunction_SizeT+0xdf
python312.dll!PyObject_CallMethod+0xd9
python312.dll!PyObject_CallMethod+0x61
python312.dll!PyList_Insert+0x1042
python312.dll!PyList_Insert+0xda5
python312.dll!Py_ClearStandardStreamEncoding+0x18e
python312.dll!Py_InitializeFromConfig+0x108
python312.dll!Py_InitializeFromConfig+0x82
python312.dll!Py_PreInitializeFromPyArgv+0x3e2
python312.dll!Py_Main+0x3d
python312.dll!Py_Main+0x25
python.exe+0x1230
KERNEL32.DLL!BaseThreadInitThunk+0x1d
ntdll.dll!RtlUserThreadStart+0x28
Don't think it's waiting for user input because it generally succeeds after a while.
Not a run-in-parallel issue like https://github.com/Azure/azure-sdk-for-net/issues/27722 because it repros with only a single restore.
Admin settings won't let me turn off Defender real-time to see if antivirus is the issue.
But adding python.exe to exclusions doesn't affect anything.
Don't think it's https://github.com/Azure/azure-sdk-for-net/issues/32993 because it does work after a random amount of time.
Doesn't repro if I change az.cmd to call python as the argument to tee.
Changing python's process priority doesn't change anything. I tried changing StartProcessInfo's settings but didn't seem to help.
Haven't tried downgrading to Az CLI < 2.61. Using CLI 2.71.0, python 3.12.8.
It does seem pretty consistent that it does succeed, even if it's long. Longest I've seen so far is 160s (2.7min). I hate increasing timeout for that long if it's inconsistent and is timing out because the process is blocking on something, but since it seems consistent, I'll go ahead and increase to 3 or 4 minutes.