asgiref
asgiref copied to clipboard
[3.8.0+] Django "active" language is not safe to use with ASGI
I can reproduce this issue with version 3.8.0 and above.
Create django project with 1 view which will return you active user language:
from django.utils import translation
def view(request):
return translation.get_language()
Start django under ASGI (using daphne for instance)
Then in 2 terminals/terminal-tabs start the CURL operation for this enpoint with different language in cookies:
while true; do curl ...; done;
And you will observer that every request is affecting the response on another terminal/terminal-tab.
It will be like this:
language: X
langauge: Y
language: X
Meaning it is not consistent when it should be. Especially we are interested in Content-Language header which is set automatically and for us the value of this header is affected by other requests...
NOTE: Downgrading asgiref to version below mentioned (for instance 3.7.2 fixes the issue.
Possibly related: https://github.com/django/asgiref/issues/473
@sshishov can you confirm your versions for Django and other dependencies you’re using here please?
Hi @carltongibson
My system:
MacOs + Docker (ubuntu)
Python: Python 3.11.10
Django: Django==4.2.15
ASGIRef: asgiref==3.8.1
For other dependencies, what exactly do you want as we have a lot?
I can try to create the repo with and try to reproduce there. Then share with you guys. The same issue happened with:
- Django + Gunicorn + Uvicorn (as workers)
- Django + Channels + Daphne
@sshishov That's fine. That's the core ones.
If you had a minimal test reproduction that would save me a cycle creating one myself (but no stress).
Thanks!
@sshishov I pushed #477 in the local-task-fix branch that you could try out if you had a moment.
Hi @sshishov can I ask you to test the branch in #478 and confirm it resolves you issue? Thanks. 🙏
Hi @carltongibson, going to test it in 1-2 days. I guess I can test against main now (as the MR has been merged), correct?
@sshishov Correct. Thanks!
Hi folks!
In the company where I work, we are in the process of migrating our django deployment from WSGI to ASGI. While everything is mostly going ok, we just stumbled upon this.
We were thinking in downgrading asgiref to 3.7.2, since the description says it doesn't contain the issue, but we are running Django 5.1.x, which requires asgiref>=3.8.1
We also thought about trying the main branch, but it seems to cause this new issue https://github.com/django/asgiref/issues/491
Any recommendations here?
Also, I'm available to run any tests to ensure this gets properly fixed and released ASAP. Will also try to take a look at the code myself and see if I can help too 🙏
Hi @bellini666 — thanks for the input!
So first quickly look at this comment and @andersk's reply there. I'll copy them here too (since they're on a closed PR that took me a second to find):
Me:
Hi @andersk — thanks for this. The other maybe related issue I had on my radar was #475.
It's been on my list to go back to pre-3.8.0 and refollow the logic behind each change, but haven't got to any of that 🤹. If we can pin down what's actually happening here, and resolve the issues it would be amazing 🤩
#367 is on the list too. (That was the first one I think.)
@andersk:
There’s no question those are related, in that #475 was supposedly fixed by #478 which caused #491 (as verified by running the test case in #491 before and after #478), and #367 caused #492 (as verified similarly).
What this is waiting for is for someone to sit down and unpick carefully exactly what's happening with each of the changes we introduced.
It's (still) on my list to sit down with all that and step back through it, but I didn't get there yet (obviously). If you were to have capacity to look at it, and could unpick it clearly, I do have capacity to help get the fixes in and a release made — but we need to unravel it first. (I don't want to just layer on another maybe fix without reasoning it through first.)
What this is waiting for is for someone to sit down and unpick carefully exactly what's happening with each of the changes we introduced.
Nice, thank you for the fast response! 😊
I'll try to sit down in the following days and help with that
@carltongibson here is my analysis:
The original #367, which caused this issue, made so Local would be started with a ContextVar, had an issue of modifying the dict in place instead of making a copy of it and modifying the copy.
#478 fixed the issue by converting the ContextVar to a dict that maps each attribute to a different ContextVar, effectively preventing the original issue, as even if the same dict is modified in place, the underlying keys would each be a different ContextVar, each one with its current value in the context.
This could have been fixed by copying the dict, which I see was a discussion in the original issue, that was discarded due to memory concerns. That ship has sailed already, but I don't think copying the dict would be a problem, as keys would still point to the original values. Would be different for a deep copy, though, but a shallow copy would be fine (maybe we can revisit this?)
Now, why did the fix cause the CurrentThreadExecutor already quit or is broken and deadlocks?
The issues happen when scheduling tasks to run in the loop, and those will run after the executor where it got scheduled was already _broken.
I was running the tests from #478 with storage_object = self._data.get({}).copy() (which is the one-line change to achieve the same as the PR itself) and the original line inside __setattr__, and this is what happens:
-
When the context for the nested execution was not isolated (i.e.
Localwas modifying the original dict in place), when that scheduled task runs, theAsyncToSync.executors.currentgot set to the previous executer in the previous context, but due to the non-isolation issue also affected its context, and because of that it could run in the currently running context instead of the nested one that is_broken. -
When the context for the nested execution is isolated (with
.copy()or #478 fix), the scheduled tasks in the loop will still see that executer as the current one, but it is already_broken, and thus theCurrentThreadExecutor already quit or is brokenerror happens.
In short, the issue that caused the non-isolation of the contexts ended up allowing scheduled tasks to properly find the current running executor to run in.
#494 fixes the issue after the fix to isolation by making the scheduled tasks find the closest running executor in the stack, which is basically what the original code that had isolation issues was doing "by mistake", but it is the correct approach.
So, it seems to me that by merging #494 we will fix the issues that #478 introduced, while still keeping the isolation fix from #478. I've also ran some tests with it and it indeed seems fine!
ps. while reading the code, I noticed a typing workaround I added 2 years ago, which is no longer required. I've opened #516 to fix it :)
@bellini666 Great. Thanks. That's very helpful. Let me follow through it, and we'll see if we can move forward. 🎁
OK, I'm going to close this. It's resolved by #478, problems from which should be resolved in #494 , which will be in the next release shortly. Thanks all!
@bellini666 Just to cycle back to this:
This could have been fixed by copying the dict, which I see was a discussion in the original issue, that was discarded due to memory concerns. That ship has sailed already, but I don't think copying the dict would be a problem, as keys would still point to the original values. Would be different for a deep copy, though, but a shallow copy would be fine (maybe we can revisit this?)
Yes, happy to review an assessment there. Thanks.