flask-dance
flask-dance copied to clipboard
@cached_property on blueprint.session.token causing session leak?
I noticed a concerning potential race condition today while using flask-dance via the Flask local dev server. Here's the setup:
- I have a script polling my app's "get session" endpoint, using a session that is not logged in. This endpoint invokes
flask_dance.contrib.github.authorizedin order to check if the session should be pulled from GitHub. - While that script is running, I log in to my app via GitHub on a web browser.
- 1 in 3 times, the web browser login session will be duplicated to the script's session. This is bad because the script has done nothing to activate the session.
I tried to trace the code paths within flask-dance and I think that the use of @cached_property on either the blueprint's session.token or on the blueprint's session itself is part of the problem. Even though flask_dance.contrib.github is a LocalProxy to g.flask_dance_github (which is equivalent to github_bp.session) that isn't sufficient for separation between threads -- @cached_property's cache is associated with the blueprint, which is global across all threads/requests.
Basically, what I think is happening in my case is:
- The valid login process begins to handle the
authorized()blueprint endpoint - The OAuth access token is saved to
github_bp.session.token - Before the endpoint handler returns, my script calls my "get session" endpoint
- A new thread begins to handle that parallel request, which calls
github.authorized - This checks for
github_bp.session.token, which is cached by@cached_property - It returns successfully, allowing a subsequent call to
github.get("/user")to fetch the full session info - The script then gets the transferred session
- A new thread begins to handle that parallel request, which calls
- Finally, the original
authorized()blueprint endpoint concludes, and the session is set as expected to the original request
I've done a little testing of my app that is deployed in a test environment using gunicorn (which separates requests into processes). I haven't been able to reproduce the problem there yet, but because it's stochastic, I haven't fully ruled it out yet. Regardless, though, this seems like a concern as login sessions should never leak across threads.
Lastly, I tried a quick patch to take out the use of @cached_property for session and session.token, and it seemed to fix the issue -- I was unable to trigger the bad behavior even when the logs showed the possibility of a race condition.
Should we consider removing @cached_property or replace it with something that uses a thread-local cache?
That's some impressive detective work! This seems like a real problem. I have two questions:
- Do you know of an equivalent to
@cached_propertythat uses a thread-local cache? - Do you know of any kind of automated testing framework that we could use to reliably test for this bug? That way, we can verify that it's fixed, now and in the future.
Thanks for looking into it! I spent some time this morning trying to research answers to your questions:
- I wasn't able to find any good equivalent, especially not one that uses
flask.gas you would probably want to use. The closest I found was cachetools which offers a customizable@cached()decorator, although they don't seem to have a good@cached_propertyreplacement. It may be the most straightforward and transparent to just refactor the code using@propertyand to store the result inflask.g. - Would it be appropriate to consider this bug tested if we can use the existing test framework to reproduce it? My thought is that I could probably write a test case that, in a deterministic manner, reproduces the issue. It would try to reproduce the case that I observed - pausing in the middle of processing the
authorized()endpoint and making a secondary client call to an endpoint that calls the blueprint.authorizedproperty. I can submit a PR for that test case if you would like.