bicep icon indicating copy to clipboard operation
bicep copied to clipboard

Bicep for Visual Studio cannot load modules from ACR after upgrading to Azure CLI 2.61

Open dozer75 opened this issue 1 year ago • 11 comments

Bicep version az bicep version 0.28.1 Bicep for Visual Studio version 0.28.1.47646 azure CLI 2.61

Describe the bug Visual Studio fails to load modules from ACR based registries. It gets the exception:

Severity	Code	Description	Project	File	Line	Suppression State
Error (active)	BCP192	Unable to restore the artifact with reference "br:<myacr>.azurecr.io/bicep/modules/my-module:1.0.0": Unhandled exception: Azure.Identity.AuthenticationFailedException: The ChainedTokenCredential failed due to an unhandled exception: Azure CLI authentication timed out.
 ---> Azure.Identity.AuthenticationFailedException: Azure CLI authentication timed out.
   at Azure.Identity.AzureCliCredential.RequestCliAccessTokenAsync(Boolean async, TokenRequestContext context, CancellationToken cancellationToken)
   at Azure.Identity.AzureCliCredential.GetTokenImplAsync(Boolean async, TokenRequestContext requestContext, CancellationToken cancellationToken)
   at Azure.Identity.CredentialDiagnosticScope.FailWrapAndThrow(Exception ex, String additionalMessage, Boolean isCredentialUnavailable)
   at Azure.Identity.AzureCliCredential.GetTokenImplAsync(Boolean async, TokenRequestContext requestContext, CancellationToken cancellationToken)
   at Azure.Identity.AzureCliCredential.GetTokenAsync(TokenRequestContext requestContext, CancellationToken cancellationToken)
   at Azure.Identity.ChainedTokenCredential.GetTokenImplAsync(Boolean async, TokenRequestContext requestContext, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Azure.Identity.ChainedTokenCredential.GetTokenImplAsync(Boolean async, TokenRequestContext requestContext, CancellationToken cancellationToken)
   at Azure.Identity.ChainedTokenCredential.GetTokenAsync(TokenRequestContext requestContext, CancellationToken cancellationToken)
   at Azure.Containers.ContainerRegistry.ContainerRegistryRefreshTokenCache.GetRefreshTokenFromCredentialAsync(TokenRequestContext context, String service, Boolean async, CancellationToken cancellationToken)
   at Azure.Containers.ContainerRegistry.ContainerRegistryRefreshTokenCache.GetAcrRefreshTokenAsync(HttpMessage message, TokenRequestContext context, String service, Boolean async)
   at Azure.Containers.ContainerRegistry.ContainerRegistryRefreshTokenCache.GetAcrRefreshTokenAsync(HttpMessage message, TokenRequestContext context, String service, Boolean async)
   at Azure.Containers.ContainerRegistry.ContainerRegistryChallengeAuthenticationPolicy.AuthorizeRequestOnChallengeAsyncInternal(HttpMessage message, Boolean async)
   at Azure.Core.Pipeline.BearerTokenAuthenticationPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.RedirectPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory`1 pipeline, Boolean async)
   at Azure.Containers.ContainerRegistry.ContainerRegistryRestClient.GetManifestAsync(String name, String reference, String accept, CancellationToken cancellationToken)
   at Azure.Containers.ContainerRegistry.ContainerRegistryContentClient.GetManifestInternalAsync(String reference, Boolean async, CancellationToken cancellationToken)
   at Azure.Containers.ContainerRegistry.ContainerRegistryContentClient.GetManifestAsync(String tagOrDigest, CancellationToken cancellationToken)
   at Bicep.Core.Registry.AzureContainerRegistryManager.DownloadManifestAndLayersAsync(IOciArtifactReference artifactReference, ContainerRegistryContentClient client) in C:\__w\1\s\bicep\src\Bicep.Core\Registry\AzureContainerRegistryManager.cs:line 136
   at Bicep.Core.Registry.AzureContainerRegistryManager.<>c__DisplayClass4_0.<<PullArtifactAsync>g__DownloadManifestInternalAsync|0>d.MoveNext() in C:\__w\1\s\bicep\src\Bicep.Core\Registry\AzureContainerRegistryManager.cs:line 43
--- End of stack trace from previous location ---
   at Bicep.Core.Registry.AzureContainerRegistryManager.PullArtifactAsync(RootConfiguration configuration, IOciArtifactReference artifactReference) in C:\__w\1\s\bicep\src\Bicep.Core\Registry\AzureContainerRegistryManager.cs:line 50
   at Bicep.Core.Registry.OciArtifactRegistry.TryRestoreArtifactAsync(RootConfiguration configuration, OciArtifactReference reference) in C:\__w\1\s\bicep\src\Bicep.Core\Registry\OciArtifactRegistry.cs:line 439		

I suspect that this is due to the new (horrible) az login experience Azure CLI 2.61 implemented which requires UI based authentication.

To Reproduce Steps to reproduce the behavior:

  • Ensure that you have Azure CLI 2.61 installed
  • Have an ACR with a module (Ensure that local cache doesn't contains the module already)
  • Reference that module in your bicep file
  • Open Visual Studio and your bicep file

After a short period of time, the error will occur.

Additional context Running az bicep restore -file <file>.bicep works.

dozer75 avatar Jun 07 '24 08:06 dozer75

We are still investigating the root cause. Workaround is to use command line for building.

puicchan avatar Jul 09 '24 23:07 puicchan

Is there any update on this?

CloudCees avatar Dec 04 '24 13:12 CloudCees

@CloudCees Sorry, it's been hard getting time for it. Is this affecting you badly?

StephenWeatherford avatar Dec 04 '24 19:12 StephenWeatherford

@CloudCees Sorry, it's been hard getting time for it. Is this affecting you badly?

We experience this occasionally, and it happened this time as well. After a few tries with az login and restarting VS Code, it magically disappeared.

CloudCees avatar Dec 05 '24 10:12 CloudCees

@CloudCees Please note, this bug is referring to Visual Studio, not Visual Studio Code. If you're seeing problems in Visual Studio Code, please enter a new bug with repro steps. Thanks!

@dozer75 Is this still affecting you? Thanks.

StephenWeatherford avatar Jan 08 '25 00:01 StephenWeatherford

@StephenWeatherford

Yes it still does, however it seems to work partially... I have 4 modules in my bicep, but 2 of them fails while 2 works...

To reproduce this now I deleted the .bicep/br folder before I started Visual Studio. Otherwise I followed the steps described.

dozer75 avatar Jan 08 '25 11:01 dozer75

This is quite confusing. Bicep uses AzureCliCredential, which makes a call to "az account get-access-token --output json --resource https://management.azure.com", which in turn calls into python via az.cmd: ..\python.exe -IBm azure.cli az account get-access-token --output json --resource https://management.azure.com

python seems to take an inordinate amount of time starting up (but not when I run this from command line, only from within VS).

From procmon: process start at 2:55:10 Image ... And then nothing interesting until 2:56:24: Image

During the 1.5 min wait (it varies, sometimes it's much faster, sometimes even slower), there is no logging info from az, so I don't think the python code has even started running.

Here's the start of the logging I get from az (starting at 14:56:25): 25472 : 2025-04-09 14:56:25,234 : DEBUG : cli.knack.log : File logging enabled - writing logs to 'C:\Users<user>.Azure\logs'. 25472 : 2025-04-09 14:56:25,234 : DEBUG : cli.knack.cli : Command arguments: ['account', 'get-access-token', '--output', 'json', '--resource', 'https://management.azure.com'] 25472 : 2025-04-09 14:56:25,234 : DEBUG : cli.knack.cli : init debug log: Cannot enable color. 25472 : 2025-04-09 14:56:25,234 : DEBUG : cli.knack.cli : Event: Cli.PreExecute [] 25472 : 2025-04-09 14:56:25,235 : DEBUG : cli.knack.cli : Event: CommandParser.OnGlobalArgumentsCreate [<function CLILogging.on_global_arguments at 0x0000016957CB3240>, <function OutputProducer.on_global_arguments at 0x0000016957E4C0E0>, <function CLIQuery.on_global_arguments at 0x0000016957E754E0>] 25472 : 2025-04-09 14:56:25,238 : DEBUG : cli.knack.cli : Event: CommandInvoker.OnPreCommandTableCreate [] 25472 : 2025-04-09 14:56:25,240 : DEBUG : cli.azure.cli.core : Modules found from index for 'account': ['azure.cli.command_modules.profile', 'azure.cli.command_modules.resource'] ...

So I don't think it's anything that az is doing (like waiting for user input).

Currently AzureCliCredential has a timeout on waiting for az's get-access-token of 18 seconds. I can increase that to, say, 2 min, but it doesn't seem even that's guaranteed.

StephenWeatherford avatar Apr 09 '25 22:04 StephenWeatherford

From both "az bicep restore" CLI and vscode, the delay from starting python and the code running is only 1-2 seconds.

StephenWeatherford avatar Apr 09 '25 22:04 StephenWeatherford

Image

ntoskrnl.exe!KeSynchronizeExecution+0x8136
ntoskrnl.exe!KeLowerIrql+0x1a05
ntoskrnl.exe!KeLowerIrql+0x2e27
ntoskrnl.exe!KeWaitForSingleObject+0x256
ntoskrnl.exe!ExReleasePushLockSharedEx+0x9d
ntoskrnl.exe!KeUserModeCallback+0x35c
ntoskrnl.exe!CmCallbackGetKeyObjectID+0x9c30
ntoskrnl.exe!setjmpex+0x9115
ntdll.dll!ZwQueryInformationFile+0x14
KERNELBASE.dll!SetFilePointerEx+0xbb
ucrtbase.dll!fputc+0x2ad
ucrtbase.dll!fseeki64+0xee
ucrtbase.dll!lseeki64+0x28
python312.dll!PyThread_allocate_lock+0x474
python312.dll!PyThread_allocate_lock+0x15f
python312.dll!PyUnicode_InsertThousandsGrouping+0x7a8
python312.dll!PyObject_VectorcallMethod+0xab
python312.dll!PyThread_allocate_lock+0x87c
python312.dll!PyThread_allocate_lock+0x81d
python312.dll!PyThread_allocate_lock+0x682
python312.dll!PyThread_allocate_lock+0x5ab
python312.dll!PyObject_Vectorcall+0x427
python312.dll!PyObject_CallFunction_SizeT+0xdf
python312.dll!PyObject_CallFunction_SizeT+0x48
python312.dll!PyObject_CallMethod+0x58a
python312.dll!PyOS_FSPath+0x1d6
python312.dll!PySet_Add+0x60e
python312.dll!PyObject_Vectorcall+0xcf
python312.dll!PyObject_CallFunction_SizeT+0xdf
python312.dll!PyObject_CallMethod+0xd9
python312.dll!PyObject_CallMethod+0x61
python312.dll!PyList_Insert+0x1042
python312.dll!PyList_Insert+0xda5
python312.dll!Py_ClearStandardStreamEncoding+0x18e
python312.dll!Py_InitializeFromConfig+0x108
python312.dll!Py_InitializeFromConfig+0x82
python312.dll!Py_PreInitializeFromPyArgv+0x3e2
python312.dll!Py_Main+0x3d
python312.dll!Py_Main+0x25
python.exe+0x1230
KERNEL32.DLL!BaseThreadInitThunk+0x1d
ntdll.dll!RtlUserThreadStart+0x28

StephenWeatherford avatar Apr 09 '25 22:04 StephenWeatherford

Don't think it's waiting for user input because it generally succeeds after a while.

Not a run-in-parallel issue like https://github.com/Azure/azure-sdk-for-net/issues/27722 because it repros with only a single restore.

Admin settings won't let me turn off Defender real-time to see if antivirus is the issue. But adding python.exe to exclusions doesn't affect anything. Image

Don't think it's https://github.com/Azure/azure-sdk-for-net/issues/32993 because it does work after a random amount of time.

Doesn't repro if I change az.cmd to call python as the argument to tee.

Changing python's process priority doesn't change anything. I tried changing StartProcessInfo's settings but didn't seem to help.

Haven't tried downgrading to Az CLI < 2.61. Using CLI 2.71.0, python 3.12.8.

StephenWeatherford avatar Apr 10 '25 16:04 StephenWeatherford

It does seem pretty consistent that it does succeed, even if it's long. Longest I've seen so far is 160s (2.7min). I hate increasing timeout for that long if it's inconsistent and is timing out because the process is blocking on something, but since it seems consistent, I'll go ahead and increase to 3 or 4 minutes.

StephenWeatherford avatar Apr 10 '25 17:04 StephenWeatherford