authlib
authlib copied to clipboard
CSRF Warning! State not equal in request and response airflow -keycloak
Describe the bug
It happens when using authlib to configure Keycloak for Airflow. Everything works perfectly up until redirecting back from Keycloak to Airflow.
Error Stacks
Something bad has happened.
Please consider letting us know by creating a [bug report using GitHub](https://github.com/apache/airflow/issues/new/choose).
Python version: 3.6.15
Airflow version: 2.1.4
Node: airflow-webserver-66fbff449c-wc8ht
-------------------------------------------------------------------------------
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/home/airflow/.local/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/airflow/.local/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/airflow/.local/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/home/airflow/.local/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/home/airflow/.local/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "/home/airflow/.local/lib/python3.6/site-packages/flask_appbuilder/security/views.py", line 655, in oauth_authorized
resp = self.appbuilder.sm.oauth_remotes[provider].authorize_access_token()
File "/home/airflow/.local/lib/python3.6/site-packages/authlib/integrations/flask_client/apps.py", line 102, in authorize_access_token
params = self._format_state_params(state_data, params)
File "/home/airflow/.local/lib/python3.6/site-packages/authlib/integrations/base_client/sync_app.py", line 234, in _format_state_params
raise MismatchingStateError()
authlib.integrations.base_client.errors.MismatchingStateError: mismatching_state: CSRF Warning! State not equal in request and response.
To Reproduce
A minimal example to reproduce the behavior: This is my code: import os import json import logging
from flask import session
from airflow.www.security import AirflowSecurityManager
from flask_appbuilder.security.manager import AUTH_OAUTH
from flask import get_flashed_messages, request, redirect, flash
from flask_appbuilder import expose
from flask_appbuilder._compat import as_unicode
from flask_appbuilder.security.views import AuthView
from flask_login import login_user, logout_user
from airflow import configuration as conf
from airflow.www.security import AirflowSecurityManager
# The SQLAlchemy connection string.
SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN')
#log = logging.getLogger(__name__)
MY_PROVIDER = 'keycloak'
CLIENT_ID = 'airflow'
CLIENT_SECRET = 'LDKlkdqwkdowkdokwodok2'
KEYCLOAK_BASE_URL = 'https://keyclk.xxx.io/auth/realms/Tata'
KEYCLOAK_TOKEN_URL = 'https://keyclk.xxx.io/auth/realms/Tata/protocol/openid-connect/token'
KEYCLOAK_AUTH_URL = 'https://keyclk.xxx.io/auth/realms/Tata/protocol/openid-connect/auth'
#KEYCLOAK_API_URL = 'https://keyclk.xxx.io/auth/realms/Tata'
KEYCLOAK_API_URL = 'https://keyclk.xxx.io/auth/realms/Tata/protocol/openid-connect/'
AUTH_TYPE = AUTH_OAUTH
AUTH_USER_REGISTRATION = True
AUTH_USER_REGISTRATION_ROLE = "Public"
AUTH_ROLES_SYNC_AT_LOGIN = True
CSRF_ENABLED = True
#PERMANENT_SESSION_LIFETIME = 1800
# a mapping from the values of `userinfo["role_keys"]` to a list of FAB roles
AUTH_ROLES_MAPPING = {
"airflow_admin": ["Admin"],
"airflow_op": ["Op"],
"airflow_user": ["User"],
"airflow_viewer": ["Viewer"],
"airflow_public": ["Public"],
}
OAUTH_PROVIDERS = [
{
'name': 'keycloak',
'icon': 'fa-circle-o',
'token_key': 'access_token',
'remote_app': {
'client_id': CLIENT_ID,
'client_secret': CLIENT_SECRET,
'api_base_url': KEYCLOAK_BASE_URL,
'response_type': 'code',
'grant_type': 'authorization_code',
'client_kwargs': {
'scope': 'email profile openid roles'
},
'request_token_url': None,
'access_token_url': KEYCLOAK_TOKEN_URL,
'authorize_url': KEYCLOAK_AUTH_URL,
'userinfo_endpoint': 'https://keyclk.xxx.io/auth/realms/Tata/protocol/openid-connect/userinfo',
'logout_redirect_url': 'https://keyclk.xxx.io/auth/realms/Tata/protocol/openid-connect/logout'
}
}
]
Expected behavior
Airflow redirects user to keycloak authentication site as expected. Upon finishing authenticating and getting redirected back to airflow, CSRF Warning! State not equal in request and response occur.
Environment:
- OS:
- Python Version: 3.6.15
- Authlib Version: 1.0.0
Airflow runs on kubernetes cluster and keycloak runs on ECS fargate container within the same VPC in AWS.
Additional context
Tried on different browsers and in incognito mode, but it still does not work.
Hi,
Any updates on this issues
Thanks
Can you create a runnable example? I'm not familiar with airflow.
Hi,
You will need an AWS account. In that create a VPC. Install a EKS cluster and install Keycloak which runs on an ECS container. Airflow has to be installed on Kubernetes using helm charts.
If this is not possible for you, can we have a screen sharing session or I can even share additional logs. I have enabled FAB additional logging for airlfow. This shows that authorization is happening and then in authlib it detects a state mismatch.
Thanks
Hi, I've seen such an issue somewhere, it was caused by session not set properly. Can you check your session based on secure cookie? Just check if the server can get the session value, and if the browser contains those session data.
I am reasonably sure this is a bug.
I am using Django as a client against a sever that's not Google, Twitter or Facebook and I'm getting CSRF session mismatch errors when calling authorize_access_token().
In my case this appears to be coming from framework.get_state_data() as it is looking for a key in this form f'_state_{self.name}_{state}' when request.session doesn't have a key in that form.
My guess is when using this library with one of a handful of known OAuth providers the key for the CSRF token in the request.session is in the form f'_state_{self.name}_{state}' and so things might be able to work.
But in a Django context when using OAuthlb as a client there is a bug.
- The bug
It is assumed the token is in the
request.sessionand is referenced by a key in the formf'_state_{self.name}_{state}'
If the CSRF token is in my request.session at all, it's going to be keyed as 'state'.
We can see in the code snippets below...
authorize_access_token sets the session key in params to 'state' then calls get_state_data passing request.session and the value of the session token.
get_state_data tries to recover a session token value using a key in the form f'_state_{self.name}_{state}'. In my case it will always return None.
Then authorize_access_token calls _format_state_params(state_data, params) where state_data is None and our MismatchingStateError is raised.
Apps.py
class DjangoOAuth2App(DjangoAppMixin, OAuth2Mixin, OpenIDMixin, BaseApp):
client_cls = OAuth2Session
def authorize_access_token(self, request, **kwargs):
"""Fetch access token in one step.
:param request: HTTP request instance from Django view.
:return: A token dict.
"""
if request.method == 'GET':
error = request.GET.get('error')
if error:
description = request.GET.get('error_description')
raise OAuthError(error=error, description=description)
params = {
'code': request.GET.get('code'),
'state': request.GET.get('state'),
}
else:
params = {
'code': request.POST.get('code'),
'state': request.POST.get('state'),
}
state_data = self.framework.get_state_data(request.session, params.get('state'))
self.framework.clear_state_data(request.session, params.get('state'))
params = self._format_state_params(state_data, params)
token = self.fetch_access_token(**params, **kwargs)
if 'id_token' in token and 'nonce' in state_data:
userinfo = self.parse_id_token(token, nonce=state_data['nonce'])
token['userinfo'] = userinfo
return token
framework_integration.py
def get_state_data(self, session, state):
key = f'_state_{self.name}_{state}'
if self.cache:
value = self._get_cache_data(key)
else:
value = session.get(key)
if value:
return value.get('data')
return None
sync_app.py
@staticmethod
def _format_state_params(state_data, params):
if state_data is None:
raise MismatchingStateError()
code_verifier = state_data.get('code_verifier')
if code_verifier:
params['code_verifier'] = code_verifier
redirect_uri = state_data.get('redirect_uri')
if redirect_uri:
params['redirect_uri'] = redirect_uri
return params
I have not put much time into thinking how to fix this, but the OAuth2 client documentation I've been reading suggests that the CSRF token is called 'state' so I'm not entirely sure why there's a munged key in the mix here at all.
I might be able to supply Django code if you're interested in replicating the error. But this bug has taken up a significant amount of time to characterise and find so I am now in a time crunch to make get things working.
for later reference, I used the instructions here to trigger the above scenario.
https://docs.authlib.org/en/latest/client/django.html
@bradbase here is the demo for django: https://github.com/authlib/demo-oauth-client/tree/master/django-google-login
It works well
@lepture Thank you.
Your example looks like it would work very well but it's optimised for logging against Google and I need to auth against Xero.
Xero has particular needs for its header and, as mentioned above, calls "state", "state". I have not seen a way to configure authlib finely enough to succeed.
Cheers
@bradbase state is added automatically. It is a part of the OAuth 2.0 logic.
@kurian-dm please make sure your session works. Same as https://github.com/lepture/authlib/issues/518
@lepture Thank you.
Your example looks like it would work very well but it's optimised for logging against Google and I need to auth against Xero.
Xero has particular needs for its header and, as mentioned above, calls "state", "state". I have not seen a way to configure authlib finely enough to succeed.
Cheers
Did you ever get this resolved?