microsoft-authentication-library-for-dotnet icon indicating copy to clipboard operation
microsoft-authentication-library-for-dotnet copied to clipboard

[Bug] System.Net.Http.HttpRequestException was thrown intermittently when acquire token

Open qinl-li opened this issue 1 year ago • 4 comments

Library version used

4.61.1

.NET version

net472

Scenario

ConfidentialClient - service to service (AcquireTokenForClient)

Is this a new or an existing app?

The app is in production, and I have upgraded to a new version of MSAL

Issue description and reproduction steps

After adopting ests-r, we observed intermittent process crashes in certain regions. We are currently investigating the root cause. The call stack indicates that the issue arises during token acquisition. The issue seems to have disappeared after turning off ests-r.

Below, I have included the relevant code. Are there any known issues or guidance for a possible solution? Should we apply retry logic here? Any help is greatly appreciated! thanks!

`Application: ServiceHost.exe Framework Version: v4.0.30319 Description: The process was terminated due to an unhandled exception. Exception Info: System.Net.Sockets.SocketException at System.Net.Sockets.Socket.EndReceive(System.IAsyncResult) at System.Net.Sockets.NetworkStream.EndRead(System.IAsyncResult)

Exception Info: System.IO.IOException at System.Net.Security._SslStream.EndRead(System.IAsyncResult) at System.Net.TlsStream.EndRead(System.IAsyncResult) at System.Net.Connection.ReadCallback(System.IAsyncResult)

Exception Info: System.Net.WebException at System.Net.HttpWebRequest.EndGetResponse(System.IAsyncResult) at System.Net.Http.HttpClientHandler.GetResponseCallback(System.IAsyncResult)

Exception Info: System.Net.Http.HttpRequestException at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.Identity.Client.Http.HttpManager+<ExecuteAsync>d__15.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.Identity.Client.Http.HttpManagerWithRetry+<SendRequestAsync>d__7.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.Identity.Client.Http.HttpManagerWithRetry+<SendPostAsync>d__1.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.Identity.Client.Http.HttpManager+<SendPostAsync>d__7.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.Identity.Client.OAuth2.OAuth2Client+<ExecuteRequestAsync>d__121[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.Identity.Client.OAuth2.OAuth2Client+<GetTokenAsync>d__11.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.Identity.Client.OAuth2.TokenClient+<SendHttpAndClearTelemetryAsync>d__11.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.Identity.Client.OAuth2.TokenClient+<SendTokenRequestAsync>d__5.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.Identity.Client.Internal.Requests.RequestBase+<SendTokenRequestAsync>d__25.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.Identity.Client.Internal.Requests.ClientCredentialRequest+<FetchNewAccessTokenAsync>d__3.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.Identity.Client.Internal.Requests.ClientCredentialRequest+<ExecuteAsync>d__2.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.Identity.Client.Internal.Requests.RequestBase+<RunAsync>d__12.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.Identity.Client.ApiConfig.Executors.ConfidentialClientExecutor+<ExecuteAsync>d__3.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at AzureSearch.Identity.RegionalClientCertificateCredential+<GetTokenAsync>d__4.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.WindowsAzure.Search.Core.CCSTokenValidationCredential+<GetTokenAsync>d__4.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.WindowsAzure.Search.Util.TokenCredentialExtensions+TokenCredentialState+<Renew>d__9.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) at Microsoft.Azure.Storage.Auth.TokenCredential+<RenewTokenAsync>d__11.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() at System.Threading.ThreadPoolWorkQueue.Dispatch()

Relevant code snippets

using System;
using System.Diagnostics;
using System.Security.Cryptography.X509Certificates;
using System.Threading;
using System.Threading.Tasks;
using Azure.Core;
using Azure.Identity;
using Microsoft.Identity.Client;

namespace AzureSearch.Identity
{
    public class RegionalClientCertificateCredential : TokenCredential

    {
        private readonly IConfidentialClientApplication _confidentialClient;
        private readonly bool _sendCertificateChain;

        public RegionalClientCertificateCredential(
            string authority, string tenantId, string clientId, X509Certificate2 clientCertificate, string region, bool sendCertificateChain)
        {
            _confidentialClient = ConfidentialClientApplicationBuilder.Create(clientId)
                                    .WithAuthority(authority, tenantId)
                                    .WithCertificate(clientCertificate)
                                    .WithAzureRegion(region)
                                    .Build();
            _sendCertificateChain = sendCertificateChain;
        }

        public override AccessToken GetToken(
            TokenRequestContext requestContext, CancellationToken cancellationToken) =>
                GetTokenAsync(requestContext, cancellationToken).GetAwaiter().GetResult();

        public override async ValueTask<AccessToken> GetTokenAsync(
            TokenRequestContext requestContext, CancellationToken cancellationToken)
        {
            AcquireTokenForClientParameterBuilder request = _confidentialClient.AcquireTokenForClient(requestContext.Scopes)
                                         .WithSendX5C(_sendCertificateChain);

            if (!String.IsNullOrEmpty(requestContext.TenantId))
            {
                request.WithTenantId(requestContext.TenantId);
            }

            if (!String.IsNullOrEmpty(requestContext.ParentRequestId) && Guid.TryParse(requestContext.ParentRequestId, out Guid guid))
            {
                request.WithCorrelationId(guid);
            }

            if (!String.IsNullOrEmpty(requestContext.Claims))
            {
                request.WithClaims(requestContext.Claims);
            }

            try
            {
                Log.TraceVerbose($"{nameof(RegionalClientCertificateCredential)} start to acquire token for tenant \"{requestContext.TenantId}\" with parent requst ID \"{requestContext.ParentRequestId}\"");
                AuthenticationResult result = await request.ExecuteAsync(cancellationToken).ConfigureAwait(false);

                return new AccessToken(result.AccessToken, result.ExpiresOn);
            } catch(MsalServiceException ex)
            {
                // This follows the example from the identity SDK for wrapping MSAL exceptions, so
                // disabling this warning to remove noise
#pragma warning disable AZS0005 // This method is unsafe and should be avoided unless absolutely necessary
                throw new AuthenticationFailedException(ex.GetMessageRawUnsafe(), ex);
#pragma warning restore AZS0005 // This method is unsafe and should be avoided unless absolutely necessary
            }
        }
    }
}

Expected behavior

No response

Identity provider

Microsoft Entra ID (Work and School accounts and Personal Microsoft accounts)

Regression

No response

Solution and workarounds

No response

qinl-li avatar Jun 05 '24 23:06 qinl-li

The error does come from MSAL's HttpClient request to the STS, but the strack trace is not sufficient to understand what is happening. It could be an error from the server (400 or 401 response) which MSAL transforms in MsalServiceException, a timeout, bad HTTP call (maybe the region string is bad?).

Next steps:

  • Can you try to isolate the exception message ?
  • It would also help to have some basic logs from the affected machines.
  • Does the issue happen only in 1 region?

bgavrilMS avatar Jun 06 '24 11:06 bgavrilMS

Unfortunately, we don't have any isolated exception to investigate. (We will add more telemetry around it) The callback was captured from OS events during process crashes. It did happen across 21 regions according to our logs. Since the ests-r has been turned off, now, we are flagged in s360. Any guidance for next step? Thanks!

qinl-li avatar Jun 06 '24 16:06 qinl-li

Guidance would be to add logging and some monitoring as well, and run in pre-prod env or in a low traffic env to identify the root cause.

How is it failing in the 21 regions? All 100% of calls fail? This would indicate that the region strings are wrong. See here for region list.

Also, regional auth is fully tied to sending certificate chain. Not sure why it's an option in your code, but that flag should always be true.

bgavrilMS avatar Jun 07 '24 11:06 bgavrilMS

The issue did happen across regions. But it doesn't seem to be 100% fail but an intermittent issue. I am working on adding the logs and monitoring and will let you know once I have more details.

Regarding the option of sending certificate chain, I believe we can safely remove it since it was enabled by default

qinl-li avatar Jun 07 '24 22:06 qinl-li