microsoft-authentication-library-for-python
microsoft-authentication-library-for-python copied to clipboard
Inconsistent behavior between MSAL.PY and MSAL.NET while saving `organizations` tokens
MSAL.PY
While saving ATs to cache, MSAL.PY gets tenant_id (realm) from token_endpoint:
https://github.com/AzureAD/microsoft-authentication-library-for-python/blob/31b24afe3eb6edd3af58bab40f1387be02cb389d/msal/token_cache.py#L128
For organizations, token_endpoint is queried from
https://login.microsoftonline.com/organizations/v2.0/.well-known/openid-configuration
"token_endpoint":"https://login.microsoftonline.com/organizations/oauth2/v2.0/token"
MSAL.PY keeps the AT as "realm": "organizations".
MSAL.NET
On the other hand, MSAL.NET gets tenantId from idToken which will always be a GUID.
https://github.com/AzureAD/microsoft-authentication-library-for-dotnet/blob/3d9cb46d824820a580b7f826a71ecd5beb8131a8/src/client/Microsoft.Identity.Client/TokenCache.ITokenCacheInternal.cs#L46
var tenantId = GetTenantId(idToken, requestParams);
https://github.com/AzureAD/microsoft-authentication-library-for-dotnet/blob/3d9cb46d824820a580b7f826a71ecd5beb8131a8/src/client/Microsoft.Identity.Client/TokenCache.ITokenCacheInternal.cs#L65-L70
msalAccessTokenCacheItem =
new MsalAccessTokenCacheItem(
...
tenantId,
MSAL.NET keeps the AT as "realm": "54826b22-38d6-4fb2-bad9-b7b93a3e9c5a".
This aspect has been under-defined in MSAL libraries, but I think most MSALs will use the id token's tid claim to store tokens, because organizations (as well as common) is really a moniker for "I don't know the tenant, let AAD / the user figure it out". Once you get a token response from AAD, the tenant is known.
I intend to mark this issue as ByDesign/WontFix. And this is why.
MSAL Python's current behavior existed since day 1 (Dec 2018). During all these years, it was only noticed by our peers who were looking into the content inside token cache, but there have been zero github issues or ICMs from outside customers complaining about this, not even when/if the token cache file being shared by different MSALs. The current MSAL Python cache behavior was a delicate design. If it were a blocking sev1/sev2 bug, it wouldn't have survived that long.
This aspect has been under-defined in MSAL libraries, but I think most MSALs will use the id token's
tidclaim to store tokens, becauseorganizations(as well ascommon) is really a moniker for "I don't know the tenant, let AAD / the user figure it out". Once you get a token response from AAD, the tenant is known.
MSAL Python's token cache design can be said to be "tenant agnostic". The tenant string inside the authority is merely treated as an opaque string. MSAL Python only needs to make sure an app with authority https://l.m.c/FOO can hit their tokens in subsequent calls with same FOO string, but never hit tokens for tenant BAR, all without needing to understand the meaning of FOO (regardless of whether FOO is a "contoso.onmicrosoft.com" or guid or "organizations" or "common").
The benefit is that MSAL Python's token cache logic can be vastly simplified. For example, recently I participated in another code review, and did a self-audit and confirmed that MSAL Python does not need to implement this logic.