`asgiref.local.Local` creates reference cycles that require garbage collection to be freed when executed in a sync context
We use Django, which uses asgiref.local.Local for its connection management. While debugging memory buildup, I've noticed that thread-critical Locals create reference cycles when used in a sync context.
Steps to reproduce
- Disable garbage collection
- Set garbage collectors debug mode to
DEBUG_LEAK - Create a
Localvariable in synchronous context - Try to read an inexistent attribute from said
Local - Force garbage collection and look at its output
import gc
from asgiref.local import Local
l = Local(thread_critical=True)
gc.collect() # make sure there is no lingering garbage
gc.disable()
gc.set_debug(gc.DEBUG_LEAK)
try:
getattr(l, "missing")
except AttributeError:
pass
gc.collect()
gc.set_debug(0)
Explanation
When Django tries to reference a database connection that is not yet established, it executes something like this (paraphrasing):
from asgiref.local import Local
connections = Local()
def get_connection(alias: str):
try:
return getattr(connections, alias)
except AttributeError:
conn = create_new_connection(alias)
setattr(connections, alias, conn)
return conn
Now, internally, asgiref's Local does this:
def __getattr__(self, key):
with self._lock_storage() as storage:
return getattr(storage, key)
@contextlib.contextmanager
def _lock_storage(self):
if self._thread_critical:
try:
asyncio.get_running_loop()
except RuntimeError:
yield self._storage
else:
...
else:
...
Now, putting everything together:
- Django calls
getattron theLocalobject - The
_lock_storagecontext manager is entered - It attempts to find the
asyncio.get_running_loop(), which raises aRuntimeError - The exception handler yields
self._storage(at this point, we're still in the exception handler inside the context manager) Localexecutesgetattronstorage, which raises anAttributeError- The
AttributeErroris propagated back to the context manager and since it's in the exception handler, it's linked to the previousRuntimeError(Python assumes theAttributeErrorwas raised while attempting to handle theRuntimeError) - At this point, both exceptions hold each other referenced (one through exception chaining, the other through the traceback)
- They also hold everything up to the point in my code where I attempted to use the database connection referenced, preventing those objects from being freed as well
Potential solution
Changing the _lock_storage implementation to the following fixes the issue:
@contextlib.contextmanager
def _lock_storage(self):
if self._thread_critical:
is_async = True
try:
asyncio.get_running_loop()
except RuntimeError:
is_async = False
if not is_async:
yield self._storage
else:
... # remaining code unchanged
else:
...
Which version of asgiref is this against? There is a potential fix for this in main that has not yet been in a public release.
@andrewgodwin I noticed this in asgiref 3.8.1, but I checked that this code path was the same on main.
So I've found another issue related to this. Your explanation of the problem helped me understand a bug in my test suite, which uses TransactionTestCase.serialized_rollback.
Quick reminder, BaseDatabaseCreation.create_test_db() stores a serialized fixture of initial data during database setup (in connection._test_serialized_contents), and django.test.testcases.TransactionTestCase._fixture_setup rolls it back if TransactionTestCase.serialized_rollback == True
Prior to asgiref==3.8.0, the connection handle is kept between the main thread that creates database and _fixture_setup (which is in another thread?).
Then in 3.8.0, the connection handle is recreated somehow, losing connection._test_serialized_contents effectively losing the serialized data and preventing data initialisation.
Django version: 5.0.13
Python version: 3.12.9
I've tracked this down to my usage of playwright: sync_playwright().start() (which sets a new ruuning loop) makes the reference loss, so tell me if I should open another issue but I don't think this is related
@patrys Thanks for the report. I've finally got some space in the upcoming window to sit down with asgiref. Can I ask...
Disable garbage collection... in the exception handler inside the context manager ... At this point, both exceptions hold each other referenced.
The key bit in the flow is the first one there right? (Or not?) — "Disable garbage collection". Python's GC can be a little slow shall we say, but it does work in the end no? Just trying to understand to effect here. (Independently of what to conclude)
@SebCorbin: Please do open a fresh one. There are a few related issues around 3.8 that need investigation. I'd rather have a duplicate than it slip through the gaps. (If you could minimise your example, somehow, that might be handy!) Thanks.
@carltongibson The gc does work eventually. I recommend disabling gc as a step to make sure there is no race condition during reproduction. We keep it enabled in production, but we also monkey-patch asgiref as the memory buildup prior to the next garbage collection cycle can get the entire process terminated (by the kernel OOM killer), and the collection process itself is very slow (and synchronous) with a lot of interlinked objects to visit.
@patrys Makes sense. Thanks.
... we also monkey-patch...
I see that here: https://github.com/saleor/saleor/pull/17594/
Would you like to open a PR here to upstream that?
@carltongibson, I will handle it.