social-core icon indicating copy to clipboard operation
social-core copied to clipboard

oidc backend - preferred_username not used to provision users, they end up with a random hex username

Open yrro opened this issue 3 years ago • 17 comments

Expected behaviour

When using the oidc backend to authenticate against Azure AD, users are provisioned with a username based on the preferred_username claim.

Actual behaviour

Users are provisioned with a random hexadecimal username.

What are the steps to reproduce this issue?

I'm reproducing this via NetBox which uses python-social-auth. I don't believe this issue is related to how NetBox configures python-social-auth, but I can dig into any configuration details if it's helpful.

  1. Create an app registration in Azure AD
  2. Add a client secret to the app registration
  3. Configure NetBox's configuration.py with:
    REMOTE_AUTH_BACKEND = 'social_core.backends.open_id_connect.OpenIdConnectAuth' SOCIAL_AUTH_OIDC_OIDC_ENDPOINT ='https://login.microsoftonline.com/{tenant id}/v2.0' SOCIAL_AUTH_OIDC_KEY = '{app registration id}'
  4. Run netbox/manage.py runserver
  5. Initiate login by visiting http://localhost:8000/oauth/login/oidc/
  6. Authenticate with Azure AD & get redirected back to NetBox
  7. Look at the top right of the screen - the username is in hex.

Any logs, error output, etc?

I patched the social_core.backends.open_id_connect.OpenIDConnectAuth.get_user_details method to produce some debugging output:

    def get_user_details(self, response):
        username_key = self.setting('USERNAME_KEY', self.USERNAME_KEY)
        print('username_key', username_key)
        from pprint import pprint
        print('response:')
        pprint(dict(response.items()))
        return {
            'username': response.get(username_key),
            'email': response.get('email'),
            'fullname': response.get('name'),
            'first_name': response.get('given_name'),
            'last_name': response.get('family_name'),
            'groups': response.get('groups'),    # not standardized but widely implemented
        }

This results in the following output:

username_key preferred_username
response:
{'access_token': '{an access token}',
 'email': '[email protected]',
 'expires_in': 4008,
 'ext_expires_in': 4008,
 'family_name': 'Name',
 'given_name': 'My',
 'id_token': '{an id token}',
 'name': 'My Name',
 'picture': 'https://graph.microsoft.com/v1.0/me/photo/$value',
 'scope': 'profile openid email User.Read',
 'sub': '{a sub claim}',
 'token_type': 'Bearer'}

If I take the id token from the above and decode it by hand (split into three parts separated by . and base64-decode the 2nd part) then I can see the id token returned by Azure AD. And it does include the preferred_username claim. But the response passed in by social_core.pipeline.social_auth.social_details is missing the preferred_username key.

For the record, the id token contains the following claims:

  • aud
  • iss
  • iat
  • nbf
  • exp
  • email
  • name
  • nonce
  • oid
  • preferred_username
  • rh
  • sub
  • tid
  • uti
  • ver

And the access token contains the following claims:

  • aud
  • email
  • exp
  • iat
  • iss
  • name
  • nbf
  • nonce
  • oid
  • preferred_username
  • rh
  • sub
  • tid
  • uti
  • ver

The ver for both tokens is 2.0.

Any other comments?

I haven't dug through the code yet to see exactly how the response is populated & whether it's simply a case that this claim was never extracted from the id token. I'll continue to dig into it and update this issue with what I find.

But I thought I'd file the issue now, in case someone else who's more familiar with the code can take a quick look and say whether my guess about the preferred_username not being copied out of the id token is correct.

Oh, and I can't just hack NetBox to configure python-social-auth to use the email as the username, because my users don't all have email addresses.

yrro avatar Sep 01 '22 10:09 yrro

I've figured out that response is the result of fetching an additional v1.0 access token (I don't understand why), with the resulting dict updated with the result of calling the userinfo endpont (that's where the extra fields like given_name come from).

I have hacked this into my NetBox configuration.py file:

REMOTE_AUTH_BACKEND = 'netbox.configuration.OpenIdConnectAuth'

from social_core.backends import open_id_connect
class OpenIdConnectAuth(open_id_connect.OpenIdConnectAuth):
    '''
    https://github.com/python-social-auth/social-core/issues/709
    '''

    def get_user_details(self, response):
        '''
        The stock OpenIdConnectAuth configures response to be the result of the
        call to the 'get access token' endpoint, with the result of the call to
        the 'get user info' endpoint sprinkled in.

        It doesn't include the actual decoded access token or id token
        provided by Azure AD!
        '''

        import jwt as realjwt
        try:
            decoded_id_token = realjwt.decode(response['id_token'], options={
                'verify_signature': False
            })
        except (realjwt.DecodeError, realjwt.ExpiredSignatureError) as de:
            raise AuthTokenError(self, de)

        return {
            'username': decoded_id_token['preferred_username'],
            'email': response.get('email'),
            'fullname': response.get('name'),
            'first_name': response.get('given_name'),
            'last_name': response.get('family_name'),
            'groups': response.get('groups'),
        }

Which is ugly but it works. I'm trusting the id token handed directly to me from Azure AD is not malicious, hence not validating it (in fact I lifted the code from the AzureADOAuth2.user_data method. Probably this token decode is better done in another method (user_data? But then there's already a validate_and_return_id_token method, which does return the claims from the id token, but it's only called by the request_access_token method, and the decoded claims are not put into response, but instead are saved into a field of the the OpenIDConnectAuth instance!?!) This is all very confusing...

yrro avatar Sep 01 '22 16:09 yrro

This does the same thing but more cleanly, by relying on the id_token attribute set on the backend instance in its request_access_token method.

REMOTE_AUTH_BACKEND = 'netbox.configuration.OpenIdConnectAuth'

from social_core.backends import open_id_connect
class OpenIdConnectAuth(open_id_connect.OpenIdConnectAuth):
    '''
    https://github.com/python-social-auth/social-core/issues/709
    '''

    def get_user_details(self, response):
        '''
        The stock OpenIdConnectAuth configures response to be the result of the
        call to the 'get access token' endpoint, which gives us a v1.0 access
        token for some reason. It then mixes in the result of the call to the
        'get user info' endpoint.

        As a result, 'preferred_username' will never make it into response.

        It turns out that OpenIdConnectAuth does decode & validate the original
        id token, and stores it as an attribute on itself; it makes no further
        use of the id token, but we can use that attribute to obtain values
        from the original id token and use them to provision a user with a
        username based on the preferred_username claim.
        '''
            
        return {
            'username': self.id_token['preferred_username'],
            'email': response.get('email'),
            'fullname': response.get('name'),
            'first_name': response.get('given_name'),
            'last_name': response.get('family_name'),
            'groups': response.get('groups'),
        }

yrro avatar Sep 01 '22 17:09 yrro

I have an idea about why OpenIdConnectAuth receives a v1.0 token instead of a v2.0 token. It turns out there is a hidden property on app registrations, accesTokenAcceptedVersion. You can't display or change the value of this property in the Azure Portal, and it defaults to being unset, which means the app gets a v1.0 token. Good grief.

Once you know what to search for you can find this documented here.

I'm going to try changing the value of this attribute to 2 and then see if my subclass for OpenIdConnectAuth is no longer necessary. If so I'll close this issue. Though it still seems a bit weird how the access token claims are put into the response and the identity token claims aren't used. But I probably need an OpenID Connect expert to take a look and give me the answer... ;)

yrro avatar Sep 02 '22 09:09 yrro

Well, even after setting accesTokenAcceptedVersion to 2, the access_token in the response is a v1.0 access token!

Not that it actually matters anyway -- on closer inspection the claims from response["access_token"] aren't decoded & copied to reponse after all. That was tired me talking. response is the result of calling the userinfo endpoint, + some other stuff.

So there's no way that preferred_username will ever get into response. So any code that expects it to be there will never find it.

However it's easy enough to override the class as above in order to access the decoded claims via self.id_token. I think my modified get_user_details method is usable as is, I wonder if someone who knows more about OAuth/OpenID Connect can comment.

An alternative could be: copy claims from id_token into response before the values from the 'get user info' endpoint are copied in. That way the get_user_details method doesn't need to be changed, as it will have access to both the claims from the id token & the result of the 'get user info' endpoint.

yrro avatar Sep 02 '22 18:09 yrro

Note to self: there's an Azure AD specific class available, social_core.backends.azuread_tenant.AzureADV2TenantOAuth2 which might be better to override. It uses the preferred_username claim for the user ID. But it also uses the name claim for the username(?) and it incorrectly also uses perferred_usermail for email, instead of email.

yrro avatar Aug 02 '23 10:08 yrro

Hi @yrro - were you able to solve your issue? I encounter somewhat similar - the roles nor groups claim is not recognized at all when I use SOCIAL_AUTH_OIDC_OIDC_ENDPOINT = 'https://login.microsoftonline.com/{redacted_mytenantid}/v2.0'' - which is the right endpoint provided by App Registration docs. But when I set it to https://login.microsoftonline.com/{redacted_mytenantid} I see roles and groups (I've set the App roles according to https://learn.microsoft.com/en-us/entra/identity-platform/howto-add-app-roles-in-apps

The main difference is of course 'ver': '1.0' vs 'ver': '2.0' so this corresponds to https://nicolgit.github.io/AzureAD-Endopoint-V1-vs-V2-comparison/

themysteq avatar Mar 22 '24 21:03 themysteq

I haven't had a chance to take another look at this. According to [ID token claims reference], the roles claim is always present in both v1.0 and v2.0 tokens. According to Optional claims reference the groups claim is only present in v1.0 or v2.0 tokens if the app registration is configured to include it.

yrro avatar Mar 23 '24 10:03 yrro

Hello

I had the same issue but I found in the source code that we can specify the username key in configuration.py

SOCIAL_AUTH_OIDC_USERNAME_KEY = "email"

This is fixing the username with the email name. I tried also other parameter to see if it works and yes.

def get_user_details(self, response): username_key = self.setting("USERNAME_KEY", self.USERNAME_KEY) return { "username": response.get(username_key), "email": response.get("email"), "fullname": response.get("name"), "first_name": response.get("given_name"), "last_name": response.get("family_name"), }

https://github.com/python-social-auth/social-core/blob/d7bba223c0036581b63b01d05e53b115c606dbec/social_core/backends/open_id_connect.py#L264

Hope it helps

BaptisteGallet avatar Jun 12 '24 12:06 BaptisteGallet