microsoft-authentication-library-for-java icon indicating copy to clipboard operation
microsoft-authentication-library-for-java copied to clipboard

When Global AAD is unavailable, it will cause timeout issue with AAD authentication of native cloud.

Open yunbozhang-msft opened this issue 1 year ago • 10 comments

Hi team,

I git clone MSAL4J code sample from this repo: ms-identity-java-webapp/msal-java-webapp-sample at master · Azure-Samples/ms-identity-java-webapp (github.com)

I config AAD configuration in application.properties file, and config to Azure China cloud. Endpoint is https://login.partner.microsoftonline.cn

Then run this sample in my local. Sample can be run successfully.

Then add the wrong DNS mapping in the hosts file to make the Global AAD endpoint inaccessible: image

Next to re-start sample in local, you will get timeout error:

2023-03-06 12:08:42.147 ERROR 10572 --- [onPool-worker-1] c.m.a.m.ConfidentialClientApplication    : [Correlation ID: b4352a2f-2cbe-4bb9-82a6-ae860c0addb5] Execution of class com.microsoft.aad.msal4j.AcquireTokenByAuthorizationGrantSupplier failed.

com.microsoft.aad.msal4j.MsalClientException: java.net.SocketTimeoutException: Connect timed out
	at com.microsoft.aad.msal4j.HttpHelper.executeHttpRequest(HttpHelper.java:53) ~[msal4j-1.13.5.jar:1.13.5]
	at com.microsoft.aad.msal4j.AadInstanceDiscoveryProvider.executeRequest(AadInstanceDiscoveryProvider.java:278) ~[msal4j-1.13.5.jar:1.13.5]
	at com.microsoft.aad.msal4j.AadInstanceDiscoveryProvider.sendInstanceDiscoveryRequest(AadInstanceDiscoveryProvider.java:235) ~[msal4j-1.13.5.jar:1.13.5]
	at com.microsoft.aad.msal4j.AadInstanceDiscoveryProvider.doInstanceDiscoveryAndCache(AadInstanceDiscoveryProvider.java:339) ~[msal4j-1.13.5.jar:1.13.5]
	at com.microsoft.aad.msal4j.AadInstanceDiscoveryProvider.getMetadataEntry(AadInstanceDiscoveryProvider.java:88) ~[msal4j-1.13.5.jar:1.13.5]
	at com.microsoft.aad.msal4j.AuthenticationResultSupplier.getAuthorityWithPrefNetworkHost(AuthenticationResultSupplier.java:39) ~[msal4j-1.13.5.jar:1.13.5]
	at com.microsoft.aad.msal4j.AcquireTokenByAuthorizationGrantSupplier.execute(AcquireTokenByAuthorizationGrantSupplier.java:59) ~[msal4j-1.13.5.jar:1.13.5]
	at com.microsoft.aad.msal4j.AuthenticationResultSupplier.get(AuthenticationResultSupplier.java:69) ~[msal4j-1.13.5.jar:1.13.5]
	at com.microsoft.aad.msal4j.AuthenticationResultSupplier.get(AuthenticationResultSupplier.java:18) ~[msal4j-1.13.5.jar:1.13.5]
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1760) ~[na:na]
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) ~[na:na]
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) ~[na:na]
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) ~[na:na]
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) ~[na:na]
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) ~[na:na]
Caused by: java.net.SocketTimeoutException: Connect timed out
	at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:546) ~[na:na]
	at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597) ~[na:na]
	at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) ~[na:na]
	at java.base/java.net.Socket.connect(Socket.java:633) ~[na:na]
	at java.base/sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:304) ~[na:na]
	at java.base/sun.net.NetworkClient.doConnect(NetworkClient.java:178) ~[na:na]
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:532) ~[na:na]
	at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:637) ~[na:na]
	at java.base/sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:266) ~[na:na]
	at java.base/sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:380) ~[na:na]
	at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:193) ~[na:na]
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1242) ~[na:na]
	at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1128) ~[na:na]
	at java.base/sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:179) ~[na:na]
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1665) ~[na:na]
	at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589) ~[na:na]
	at java.base/java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:529) ~[na:na]
	at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:308) ~[na:na]
	at com.microsoft.aad.msal4j.DefaultHttpClient.readResponseFromConnection(DefaultHttpClient.java:105) ~[msal4j-1.13.5.jar:1.13.5]
	at com.microsoft.aad.msal4j.DefaultHttpClient.executeHttpGet(DefaultHttpClient.java:47) ~[msal4j-1.13.5.jar:1.13.5]
	at com.microsoft.aad.msal4j.DefaultHttpClient.send(DefaultHttpClient.java:35) ~[msal4j-1.13.5.jar:1.13.5]
	at com.microsoft.aad.msal4j.HttpHelper.executeHttpRequestWithRetries(HttpHelper.java:96) ~[msal4j-1.13.5.jar:1.13.5]
	at com.microsoft.aad.msal4j.HttpHelper.executeHttpRequest(HttpHelper.java:49) ~[msal4j-1.13.5.jar:1.13.5]
	... 15 common frames omitted

Why use an indigenous cloud to access global AAD endpoints? And there was a problem with the global AAD service before, when Global AAD was unavailable, it would affect the use of the native AAD(like Azure China AAD client).

Thanks!

yunbozhang-msft avatar Mar 06 '23 05:03 yunbozhang-msft

Adding Bogdan's comment from the Incident

This is a good point. There are 2 issues here:

  1. If instance discovery fails with error except "invalid_instance", MSAL should ignore it

  2. Once instance discovery fails, MSAL should not re-attempt to perform instance discovery on that environment

I suggest we track this via a bug, as it will require a fix in the library.

siddhijain avatar Mar 06 '23 15:03 siddhijain

This issue is fixed and the fix should be available in the next msal4j release.

siddhijain avatar Mar 13 '23 21:03 siddhijain

Released version 1.13.6 of the library to take care of this. Please reopen this if the issue persists.

siddhijain avatar Mar 24 '23 20:03 siddhijain

Thanks team

获取 Outlook for iOShttps://aka.ms/o0ukef


发件人: Siddhi @.> 发送时间: Saturday, March 25, 2023 4:41:45 AM 收件人: AzureAD/microsoft-authentication-library-for-java @.> 抄送: Zhang Yunbo @.>; Author @.> 主题: Re: [AzureAD/microsoft-authentication-library-for-java] When Global AAD is unavailable, it will cause timeout issue with AAD authentication of native cloud. (Issue #605)

Released version 1.13.6 of the library to take care of this. Please reopen this if the issue persists.

― Reply to this email directly, view it on GitHubhttps://github.com/AzureAD/microsoft-authentication-library-for-java/issues/605#issuecomment-1483382480, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKSVJXQBSGM6Q2DYEBAJF4DW5YBITANCNFSM6AAAAAAVQUYBKY. You are receiving this because you authored the thread.Message ID: @.***>

yunbozhang-msft avatar Mar 25 '23 01:03 yunbozhang-msft

Hi @siddhijain I verified this issue locally, but still reported this error, I think we still need a PR to fix this issue, as some users' network is limited, and they may block some global Azure networking. Also, I do not see any PR link to this issue, so please help check if lost to merge or commit PR, thanks!

FYI: Error stack: image

yunbozhang-msft avatar Nov 07 '23 01:11 yunbozhang-msft

Also I found I do not have permission to re-open this issue, could you please help reopen this issue? thanks! @siddhijain

yunbozhang-msft avatar Nov 07 '23 01:11 yunbozhang-msft

Hello @zhangyunbo1994 : It's been some time since you first reported this issue, so just to clarify: is this a problem that started happening for some new users/scenarios, or was the original issue completely unresolved (and you only tested it recently)? Just trying to figure out if there's an edge case we didn't cover, or if we may have misunderstood the root cause.

Also, I believe this was the PR with the fix: https://github.com/AzureAD/microsoft-authentication-library-for-java/pull/606

Avery-Dunn avatar Nov 07 '23 15:11 Avery-Dunn

Hi @Avery-Dunn The original issue is completely unresolved. And also I tested this issue recently, this issue is not resolved in the latest SDK.

yunbozhang-msft avatar Nov 08 '23 03:11 yunbozhang-msft

If instance discovery fails with 404, MSAL should ignore this. We do not guarantee that MSAL won't call public cloud.

As a workaround:

  • you can disable instance discovery via .instanceDiscovery(false)); when you create the application object. See https://github.com/AzureAD/microsoft-authentication-library-for-java/pull/569/files

bgavrilMS avatar Nov 08 '23 12:11 bgavrilMS

Hi @bgavrilMS thanks!

I try the workaround locally, but still timeout, so I think back-end still try to connect to AAD public endpoint even though set instanceDiscovery to false.

image

yunbozhang-msft avatar Nov 09 '23 03:11 yunbozhang-msft