asgiref icon indicating copy to clipboard operation
asgiref copied to clipboard

`asgiref.local.Local` creates reference cycles that require garbage collection to be freed when executed in a sync context

Open patrys opened this issue 10 months ago • 8 comments

We use Django, which uses asgiref.local.Local for its connection management. While debugging memory buildup, I've noticed that thread-critical Locals create reference cycles when used in a sync context.

Steps to reproduce

  1. Disable garbage collection
  2. Set garbage collectors debug mode to DEBUG_LEAK
  3. Create a Local variable in synchronous context
  4. Try to read an inexistent attribute from said Local
  5. Force garbage collection and look at its output
import gc
from asgiref.local import Local

l = Local(thread_critical=True)
gc.collect()  # make sure there is no lingering garbage
gc.disable()
gc.set_debug(gc.DEBUG_LEAK)
try:
    getattr(l, "missing")
except AttributeError:
    pass
gc.collect()
gc.set_debug(0)

Explanation

When Django tries to reference a database connection that is not yet established, it executes something like this (paraphrasing):

from asgiref.local import Local

connections = Local()

def get_connection(alias: str):
    try:
        return getattr(connections, alias)
    except AttributeError:
        conn = create_new_connection(alias)
        setattr(connections, alias, conn)
        return conn

Now, internally, asgiref's Local does this:

def __getattr__(self, key):
    with self._lock_storage() as storage:
        return getattr(storage, key)

@contextlib.contextmanager
def _lock_storage(self):
    if self._thread_critical:
        try:
            asyncio.get_running_loop()
        except RuntimeError:
            yield self._storage
        else:
            ...
    else:
        ...

Now, putting everything together:

  1. Django calls getattr on the Local object
  2. The _lock_storage context manager is entered
  3. It attempts to find the asyncio.get_running_loop(), which raises a RuntimeError
  4. The exception handler yields self._storage (at this point, we're still in the exception handler inside the context manager)
  5. Local executes getattr on storage, which raises an AttributeError
  6. The AttributeError is propagated back to the context manager and since it's in the exception handler, it's linked to the previous RuntimeError (Python assumes the AttributeError was raised while attempting to handle the RuntimeError)
  7. At this point, both exceptions hold each other referenced (one through exception chaining, the other through the traceback)
  8. They also hold everything up to the point in my code where I attempted to use the database connection referenced, preventing those objects from being freed as well

Potential solution

Changing the _lock_storage implementation to the following fixes the issue:

@contextlib.contextmanager
def _lock_storage(self):
    if self._thread_critical:
        is_async = True
        try:
            asyncio.get_running_loop()
        except RuntimeError:
            is_async = False
        if not is_async:
            yield self._storage
        else:
            ...  # remaining code unchanged
    else:
        ...

patrys avatar Jan 24 '25 11:01 patrys

Which version of asgiref is this against? There is a potential fix for this in main that has not yet been in a public release.

andrewgodwin avatar Jan 24 '25 17:01 andrewgodwin

@andrewgodwin I noticed this in asgiref 3.8.1, but I checked that this code path was the same on main.

patrys avatar Jan 24 '25 18:01 patrys

So I've found another issue related to this. Your explanation of the problem helped me understand a bug in my test suite, which uses TransactionTestCase.serialized_rollback.

Quick reminder, BaseDatabaseCreation.create_test_db() stores a serialized fixture of initial data during database setup (in connection._test_serialized_contents), and django.test.testcases.TransactionTestCase._fixture_setup rolls it back if TransactionTestCase.serialized_rollback == True

Prior to asgiref==3.8.0, the connection handle is kept between the main thread that creates database and _fixture_setup (which is in another thread?).

Then in 3.8.0, the connection handle is recreated somehow, losing connection._test_serialized_contents effectively losing the serialized data and preventing data initialisation.

Django version: 5.0.13 Python version: 3.12.9

SebCorbin avatar Mar 28 '25 09:03 SebCorbin

I've tracked this down to my usage of playwright: sync_playwright().start() (which sets a new ruuning loop) makes the reference loss, so tell me if I should open another issue but I don't think this is related

SebCorbin avatar Apr 10 '25 10:04 SebCorbin

@patrys Thanks for the report. I've finally got some space in the upcoming window to sit down with asgiref. Can I ask...

Disable garbage collection... in the exception handler inside the context manager ... At this point, both exceptions hold each other referenced.

The key bit in the flow is the first one there right? (Or not?) — "Disable garbage collection". Python's GC can be a little slow shall we say, but it does work in the end no? Just trying to understand to effect here. (Independently of what to conclude)

@SebCorbin: Please do open a fresh one. There are a few related issues around 3.8 that need investigation. I'd rather have a duplicate than it slip through the gaps. (If you could minimise your example, somehow, that might be handy!) Thanks.

carltongibson avatar Apr 10 '25 10:04 carltongibson

@carltongibson The gc does work eventually. I recommend disabling gc as a step to make sure there is no race condition during reproduction. We keep it enabled in production, but we also monkey-patch asgiref as the memory buildup prior to the next garbage collection cycle can get the entire process terminated (by the kernel OOM killer), and the collection process itself is very slow (and synchronous) with a lot of interlinked objects to visit.

patrys avatar Apr 10 '25 11:04 patrys

@patrys Makes sense. Thanks.

... we also monkey-patch...

I see that here: https://github.com/saleor/saleor/pull/17594/

Would you like to open a PR here to upstream that?

carltongibson avatar Apr 10 '25 11:04 carltongibson

@carltongibson, I will handle it.

fowczarek avatar Apr 10 '25 12:04 fowczarek