Possible leak when exception is raised in inner coroutine
Hi, I have been debugging a memory leak in an application that uses Tornado and I was able to narrow down the behavior to what I believe is a representative minimum example, based on an existing unit test.
Edit:
- Reproducing with CPython 3.9.16 on macOS, although the original issue was on Linux with CPython 3.6.8 (I know, super old! but hopefully any workaround would still apply).
- Tested against
masteras of a6dfd70d7a6398b2187021df63ca77daa5781e5e but from what I can tell this reproduces very far back to old versions of Tornado (e.g. 4.5+)
Possibly related, from what I can gather (although closed for a long time now): #1872 #2229
I have a couple questions:
- Is this expected behavior / is there some reason I'm missing for why a reference cycle forms here? I don't have a ton of experience with Tornado directly so maybe this is just a side effect of how the IOLoop is implemented or something, and it's expected that GC is needed to clean up in this scenario.
- Is there a "correct" workaround that would help free the exception / traceback, etc.? I can
delthe local variables, but my fear is that in that case the remaining stack frames, tracebacks etc. would stick around, possibly resulting in other leaked variables that aren'tdeleted (particularly if the exception is bubbled through several coroutines in a more complicated example).
Here is the test I'm reproducing with, adapted from the existing one in gen_test.py:
@skipNotCPython
@unittest.skipIf(
(3,) < sys.version_info < (3, 6), "asyncio.Future has reference cycles"
)
def test_coroutine_refcounting(self):
# On CPython, tasks and their arguments should be released immediately
# without waiting for garbage collection.
@gen.coroutine
def raise_exc():
yield
raise ValueError("Some error")
@gen.coroutine
def inner():
class Foo(object):
pass
local_var = Foo()
self.local_ref = weakref.ref(local_var)
try:
yield raise_exc()
except ValueError:
# del local_var # <- this works to free the variable
pass
@gen.coroutine
def inner2():
yield inner()
# Error! There is a still a strong reference to the local variable via
# the ValueError traceback -> `inner` frame, but why is that?
self.assertIsNone(self.local_ref())
self.io_loop.run_sync(inner2, timeout=3)
self.assertIsNone(self.local_ref())
self.finished = True
Thanks!
Is this expected behavior / is there some reason I'm missing for why a reference cycle forms here? I don't have a ton of experience with Tornado directly so maybe this is just a side effect of how the IOLoop is implemented or something, and it's expected that GC is needed to clean up in this scenario.
I wouldn't say it's "expected" (I do treat this kind of thing as a bug when I find it), but it's a fairly common consequence of the way the coroutine decorator and Runner work. It's reimplementing (in python) a lot of things that the python interpreter does, and especially on the exception-handling path it's really easy to get reference cycles through tracebacks. The answer is usually to del a local variable somewhere in tornado.gen.Runner.
FWIW the situation is improved with native (async def) coroutines, although I think the issue is still more likely to occur than in normal synchronous code.
Is there a "correct" workaround that would help free the exception / traceback, etc.?
So far we've always been able to fix these kinds of issues by fixing the coroutine runner so no workaround is needed in the application. I'd have to dig into this to see where exactly the issue is but with a reproducible test case it shouldn't be too hard. The relatively-new circlerefs_test.py is how I diagnose these issues and keep them from recurring.