microsoft-authentication-library-for-python icon indicating copy to clipboard operation
microsoft-authentication-library-for-python copied to clipboard

Inconsistent behavior between MSAL.PY and MSAL.NET while saving `organizations` tokens

Open jiasli opened this issue 4 years ago • 1 comments

MSAL.PY

While saving ATs to cache, MSAL.PY gets tenant_id (realm) from token_endpoint:

https://github.com/AzureAD/microsoft-authentication-library-for-python/blob/31b24afe3eb6edd3af58bab40f1387be02cb389d/msal/token_cache.py#L128

For organizations, token_endpoint is queried from

https://login.microsoftonline.com/organizations/v2.0/.well-known/openid-configuration

"token_endpoint":"https://login.microsoftonline.com/organizations/oauth2/v2.0/token"

MSAL.PY keeps the AT as "realm": "organizations".

MSAL.NET

On the other hand, MSAL.NET gets tenantId from idToken which will always be a GUID.

https://github.com/AzureAD/microsoft-authentication-library-for-dotnet/blob/3d9cb46d824820a580b7f826a71ecd5beb8131a8/src/client/Microsoft.Identity.Client/TokenCache.ITokenCacheInternal.cs#L46

            var tenantId = GetTenantId(idToken, requestParams);

https://github.com/AzureAD/microsoft-authentication-library-for-dotnet/blob/3d9cb46d824820a580b7f826a71ecd5beb8131a8/src/client/Microsoft.Identity.Client/TokenCache.ITokenCacheInternal.cs#L65-L70

                msalAccessTokenCacheItem =
                    new MsalAccessTokenCacheItem(
                        ...
                        tenantId,

MSAL.NET keeps the AT as "realm": "54826b22-38d6-4fb2-bad9-b7b93a3e9c5a".

jiasli avatar Apr 07 '21 10:04 jiasli

This aspect has been under-defined in MSAL libraries, but I think most MSALs will use the id token's tid claim to store tokens, because organizations (as well as common) is really a moniker for "I don't know the tenant, let AAD / the user figure it out". Once you get a token response from AAD, the tenant is known.

bgavrilMS avatar Apr 07 '21 15:04 bgavrilMS

I intend to mark this issue as ByDesign/WontFix. And this is why.

MSAL Python's current behavior existed since day 1 (Dec 2018). During all these years, it was only noticed by our peers who were looking into the content inside token cache, but there have been zero github issues or ICMs from outside customers complaining about this, not even when/if the token cache file being shared by different MSALs. The current MSAL Python cache behavior was a delicate design. If it were a blocking sev1/sev2 bug, it wouldn't have survived that long.

This aspect has been under-defined in MSAL libraries, but I think most MSALs will use the id token's tid claim to store tokens, because organizations (as well as common) is really a moniker for "I don't know the tenant, let AAD / the user figure it out". Once you get a token response from AAD, the tenant is known.

MSAL Python's token cache design can be said to be "tenant agnostic". The tenant string inside the authority is merely treated as an opaque string. MSAL Python only needs to make sure an app with authority https://l.m.c/FOO can hit their tokens in subsequent calls with same FOO string, but never hit tokens for tenant BAR, all without needing to understand the meaning of FOO (regardless of whether FOO is a "contoso.onmicrosoft.com" or guid or "organizations" or "common").

The benefit is that MSAL Python's token cache logic can be vastly simplified. For example, recently I participated in another code review, and did a self-audit and confirmed that MSAL Python does not need to implement this logic.

rayluo avatar Jan 23 '23 20:01 rayluo