microsoft-identity-web icon indicating copy to clipboard operation
microsoft-identity-web copied to clipboard

MsalUiRequiredException occasionally thrown due to GetAccountAsync not returning an account

Open mevans-medpace opened this issue 2 years ago • 6 comments

Microsoft.Identity.Web Library

Microsoft.Identity.Web.TokenCache

Microsoft.Identity.Web version

1.25.0

Web app

Sign-in users and call web APIs

Web API

Protected web APIs (validating tokens)

Token cache serialization

Distributed caches

Description

We've been seeing some strange behavior in our applications since converting to use Microsoft.Identity.Web. About 5-10% of the time we get a MsalUiRequiredException thrown when making a call to a downstream API. This typically happens immediately after starting the web application and launching it. The app is an Angular app and it makes several requests for data when the landing page loads.

After stepping through the code, the issue seems to be that, occasionally, the call to GetAccountAsync for the confidential client application does not return an IAccount in GetAuthenticationResultForWebAppWithAccountFromCacheAsync (TokenAcquisition.cs, line 957). This causes the call to AcquireTokenSilent to throw the exception since we pass in a null account (and no login hint).

We are using a distributed token cache, in this case a Sql cache. I have verified that the accountIdentifier from claimsPrincipal?.GetMsalAccountId() seems to be correct and the record exists in the L2 Sql cache. The account is retrieved correctly most of the time, but sometimes, unpredictably, it fails.

I have tried disabling the L1 cache using MsalDistributedTokenCacheAdapterOptions, thinking that there was a cache miss, but I have had it happen even with that option set.

I placed a conditional breakpoint in GetAuthenticationResultForWebAppWithAccountFromCacheAsync for when the account was null. I tried calling the GetAccountAsync method a second time, and it did return the IAccount.

I've had the most success re-creating the error by deleting the row for my user from the SQL cache, then starting the application twice. The first time, the token is acquired and added to the SQL cache. The second time, it will often (but not always) fail.

See below for snippets as for how we are configuring the application.

Reproduction steps

  1. Start Application
  2. Enter login credentials
  3. Application makes a call to a downstream API that requires some scopes using an extension method.
  4. If call to GetAccountAsync fails within Microsoft.Identity.Web, MsalUiRequiredException is thrown and the call to the downstream API fails. This does not happen every time the call is made.

Error message

IDW10502: An MsalUiRequiredException was thrown due to a challenge for the user. See https://aka.ms/ms-id-web/ca_incremental-consent.

Inner exception: No account or login hint was passed to the AcquireTokenSilent call. at Microsoft.Identity.Client.Internal.Requests.Silent.SilentRequest.<ExecuteAsync>d__5.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at Microsoft.Identity.Client.Internal.Requests.Silent.SilentRequest.<ExecuteAsync>d__5.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.Identity.Client.Internal.Requests.RequestBase.<RunAsync>d__12.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Microsoft.Identity.Client.ApiConfig.Executors.ClientApplicationBaseExecutor.<ExecuteAsync>d__2.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.ConfiguredTaskAwaitable1.ConfiguredTaskAwaiter.GetResult() at Microsoft.Identity.Web.TokenAcquisition.<GetAuthenticationResultForWebAppWithAccountFromCacheAsync>d__31.MoveNext() in C:\microsoft-identity-web-master\src\Microsoft.Identity.Web\TokenAcquisition.cs:line 961 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.ConfiguredTaskAwaitable1.ConfiguredTaskAwaiter.GetResult() at Microsoft.Identity.Web.TokenAcquisition.<GetAuthenticationResultForUserAsync>d__16.MoveNext() in C:\microsoft-identity-web-master\src\Microsoft.Identity.Web\TokenAcquisition.cs:line 316

Id Web logs

There are no unusual logs when this occurs.

Relevant code snippets

// Adding authentication, enabling for calling downstream APIs, and adding the token caches

var builder = services.AddMicrosoftIdentityWebAppAuthentication(configuration)
                    .EnableTokenAcquisitionToCallDownstreamApi();
                    
foreach (string downstreamApiName in downstreamApiNames)
{
    builder.AddDownstreamWebApi(downstreamApiName, configuration.GetSection(downstreamApiName));
}               

builder.AddDistributedTokenCaches();

// Adding SQL token cache:
services.AddDistributedSqlServerCache(options =>
{
    options.ConnectionString = configuration.GetConnectionString("DistributedCacheDb");
    options.SchemaName = tokenCacheOptions.SqlCacheSchemaName;
    options.TableName = tokenCacheOptions.SqlCacheTableName;
});



/// Extension method that calls the downstream API
public static async Task<HttpResponseMessage> CallWebApiForOurAppUserStringContentAsync(this IDownstreamWebApi downstreamWebApi, string baseUrl, string relativePath, ClaimsPrincipal claimsPrincipal, HttpMethod httpMethod, string content = null)
{
    return await downstreamWebApi.CallWebApiForUserAsync(
        serviceName: WebApiNames.OurAppApi, // a constant value that matches the downstreamApiName from above
        options =>
        {
            options.BaseUrl = baseUrl;
            options.RelativePath = relativePath;
            options.HttpMethod = httpMethod;
        },
        user: claimsPrincipal,
        content: !string.IsNullOrEmpty(content) ? new StringContent(content, Encoding.UTF8, "application/json") : null);
}

Regression

No response

Expected behavior

Application is able to get the token from the cache every time.

mevans-medpace avatar Jul 19 '22 12:07 mevans-medpace

After doing some more research and stepping through the Microsoft.Identity.Web and Microsoft.Identity.Client source code, I believe I've discovered the circumstances under which this is occurring. We have multiple requests to APIs that are called in parallel. Normally, the TokenAcquisition class calls the GetAccountAsync method on IConfidentialClientApplication and returns the account. Several levels deep in Microsoft.Identity.Client, the InMemoryPartitionedUserTokenCacheAccessor tries to lookup the MsalAccountCacheItem using the partition key (user_id.tenant_id). But sometimes it fails because the AccountCacheDictionary has nothing in it even though it "should".

This led me to try to figure out what would be clearing that cache. The Clear method on the InMemoryPartitionedUserTokenCacheAccessor is being called during the OnBeforeAccessAsync method in the MsalAbstractTokenCacheProvider in Microsoft.Identity.Web.TokenCache. This happens after another read operation occurs when it's trying to deserialize what was in the cache (line 117):

args.TokenCache.DeserializeMsalV3(UnprotectBytes(tokenCacheBytes), shouldClearExistingCache: true)

The flag to clear the existing cache is set and eventually in MIC's TokenCacheJsonSerializer.Deserialize method, it does call the Clear method on the accessor before saving the new tokens and accounts (lines 71-106).

I believe that what is happening is that in the call to retrieve the account from the cache is happening between the cache being cleared and the items in it being re-added, from a subsequent request.

Is there anything we can do to mitigate this? I can re-attempt the call if it fails but that seems like a bad solution. I'm also not sure if this issue might need to be raised in the MSAL library instead of this one.

mevans-medpace avatar Jul 22 '22 14:07 mevans-medpace

Looks like this issue may be known by the MSAL team: https://github.com/AzureAD/microsoft-authentication-library-for-dotnet/issues/3385#issuecomment-1156364249

The comment implies that in this scenario, MSAL would just look up the account again but this doesn't seem to be what is happening, instead, no account is returned, the call to AcquireTokenSilent fails, and the MsalUiRequiredException is thrown. Is there a piece I'm missing?

mevans-medpace avatar Jul 22 '22 15:07 mevans-medpace

@bgavrilMS @pmaytak : is it something you are aware of?

jmprieur avatar Jul 23 '22 23:07 jmprieur

@mevans-medpace The post/comment you mention describes this behavior when the confidential client app instance is static/reused. Then yes, two different requests can operate on the same internal cache (accessor objects). But I think Microsoft.Identity.Web creates TokenAcquisition class with scoped (per request) lifetime by default. So each request should have its own MSAL instance. One suggestion is you can try with "EnableCacheSynchronization": "true" - this locks the cache during the operations. Another is to take a look at Verbose logs. Two different requests should have two different correlation IDs, so logs should show the cache state per each request.

pmaytak avatar Jul 26 '22 04:07 pmaytak

@pmaytak Thank you for the suggestion, indeed setting EnableCacheSynchronization to true seems to have corrected the problem.

@jmprieur I'm still curious as to why this issue hasn't been reported more. I'm trying to determine what we're doing differently. When our Angular app starts it's making about 5 calls to the same endpoint (we're using GraphQL and so we're passing a different query in each time) in parallel. The endpoint is a BFF API (controller methods in the web project) which in turn calls a downstream API (our own API application). We're also using claims transformation which also makes a call to the downstream API to get transformed claims for the user. We thought perhaps making 2 calls to the downstream API was causing the issue, but we did try eliminating the claims transformation and could still reproduce the issue.

Not sure if it's worth putting a note in the documentation somewhere outlining the solution/workaround, or potentially changing the default value of the setting to true, in case anyone else runs across this.

mevans-medpace avatar Jul 26 '22 14:07 mevans-medpace

Thanks @mevans-medpace Yes, I think that this is something we need to look at. A multi-threading issue? and at least document.

cc: @jennyf19 FYI

jmprieur avatar Jul 26 '22 15:07 jmprieur