cpython
cpython copied to clipboard
`PyThreadState` and `PyInterpreterState` are not freed
The following test program gains about 10M per second under top, on Python 3.11 only (confirmed for all development stages: 3.11.0a4, 3.11.0b1, 3.11.0rc1, 3.11.0) . 3.10 shows no memory growth.
from threading import Thread
while True:
t1 = Thread()
t1.start()
For direct results, here's the same program using the resource module that I just saw used over at #98467:
from threading import Thread
import resource
print("Before", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024, "MB")
i = 0
while True:
i += 1
t1 = Thread()
t1.start()
if i % 10000 == 0:
print(
f"memory after {i} iterations:",
resource.getrusage(resource.RUSAGE_SELF).ru_maxrss / 1024,
"MB",
)
The above program under Py 3.11.0 prints:
$ .venv-3.11/bin/python test5.py
Before 8.14453125 MB
memory after 10000 iterations: 11.6875 MB
memory after 20000 iterations: 15.30078125 MB
memory after 30000 iterations: 18.65234375 MB
memory after 40000 iterations: 22.26171875 MB
memory after 50000 iterations: 25.87109375 MB
memory after 60000 iterations: 29.22265625 MB
memory after 70000 iterations: 32.83203125 MB
memory after 80000 iterations: 36.44140625 MB
memory after 90000 iterations: 39.78125 MB
memory after 100000 iterations: 43.390625 MB
...
under Python 3.10 it prints:
$ .venv-3.10/bin/python test5.py
Before 9.25 MB
memory after 10000 iterations: 9.46484375 MB
memory after 20000 iterations: 9.47265625 MB
memory after 30000 iterations: 9.47265625 MB
memory after 40000 iterations: 9.47265625 MB
memory after 50000 iterations: 9.47265625 MB
memory after 60000 iterations: 9.47265625 MB
memory after 70000 iterations: 9.47265625 MB
memory after 80000 iterations: 9.47265625 MB
the issue looks extremely similar to another one we just fixed in greenlet, over at https://github.com/python-greenlet/greenlet/issues/328, although this one is much more surprising. Issue #98467 might also be related.
Environment:
$ uname -a
Linux photon3 5.17.14-200.fc35.x86_64 #1 SMP PREEMPT Thu Jun 9 14:02:42 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ python3
Python 3.11.0 (main, Nov 5 2022, 23:27:22) [GCC 11.3.1 20220421 (Red Hat 11.3.1-3)] on linux
- PR: gh-99268
- PR: gh-99301
- PR: gh-99385
Seems related to the 3.11 new frame handling.
cc @markshannon @brandtbucher
I can reproduce this.
At first glance, I'm not certain that its related to the frame stuff: debugging seems to confirm that the allocations and deallocations of stack chunks are balanced, and the issue is present even when changing which allocator we use, etc.
Plus, the leaked memory per thread (~370 bytes) seems too small to be our 16KB chunks. It's almost exactly sizeof(PyThreadState) (360 bytes), though! So my hunch is that we're leaking PyThreadState structs.
Found it... we don't free the memory if the thread state's _static member is nonzero:
https://github.com/python/cpython/blob/2eee9d9cd7eb1e396fa9a4af7c5fadeeafbdaa38/Python/pystate.c#L754-L760
It appears we're not setting it correctly. We initialize each thread by memcpy'ing the initial thread, which is statically allocated and has the member set to 1:
https://github.com/python/cpython/blob/2eee9d9cd7eb1e396fa9a4af7c5fadeeafbdaa38/Python/pystate.c#L847-L850
https://github.com/python/cpython/blob/2eee9d9cd7eb1e396fa9a4af7c5fadeeafbdaa38/Include/internal/pycore_runtime_init.h#L65-L70
So I think the fix is just to make sure that new threads have tstate->_static initialized to 0.
@ericsnowcurrently: is my understanding correct? I can write a patch, if so.
Also, it looks like PyInterpreterState has a similar leak.
I want to agree, but something seems off. I'll have to take a look tomorrow.
Confirmed that setting tstate->_static = false; after the memcpy fixes the issue.
It might be useful to define APIs to copy a PyInterpreterState and PyThreadStats where all of this can be encapsulated? granted there seems to be only one place in the code that is doing this right now so...
Having looked over the code again, I agree that tstate->_static = false; is all that needs to happen. Basically, we must consider everything that gets set in _PyThreadState_INIT . _static is the only struct member where we want to set a different value in new_threadstate(). It would be worth adding a note to that effect.
The same applies for PyInterpreterState_New() (interp->_static = false; + note pointing to _PyInterpreterState_INIT ).
(What threw me off last night is that we memcpy() the data out of initial, rather than _PyRuntime. I had missed that.)
Thinking about this more, we could even get rid of the _static flag entirely and rely on the fact that only the main interpreter and main thread state are statically allocated and the rest are heap allocated. Probably something we can do for 3.12 and later @ericsnowcurrently?
So you mean check interp == &_PyRuntime._main_interpreter instead of interp->_static (and likewise tstate == &tstate->interp->_initial_thread)? That would work. IIRC, I was planning on having additional statically allocated interpreter states and thread states, where the _static member would be more useful, but decided it wasn't worth it. Looks like _static got left behind.
Fixed, but leaving open to track the above discussion.
So you mean check interp == &_PyRuntime._main_interpreter instead of interp->_static (and likewise tstate == &tstate->interp->_initial_thread)? That would work. IIRC, I was planning on having additional statically allocated interpreter states and thread states, where the _static member would be more useful, but decided it wasn't worth it. Looks like _static got left behind.
Right, that's what I was thinking, it would make this more obvious when it is statically allocated and yes static allocation only makes sense for main interpreter and thread.
I just spent half a day looking for sources of memleaks in py3.11 provided by Ubuntu (22.04 provides python3.11 package containing 3.11.0~rc1-1~22.04 version which seems to contain this bug). Py3.11 final (I tested deadsnakes version) seems to be OK … but with 22.04 fairly widely present chances are I am not the last one.
As a sidenote, in my case problem appeared in code which embeds python and uses PyGILState_Ensure / PyGILState_Release to wrap foreign threads. App which (after simplification):
- initializes interpreter (
Py_Initialize+PyEval_InitThreads) - spawns 20 C threads,
- inside each thread does simple
PyGILState_Ensure+PyGILState_Release - joins those threads
- finalizes interpreter (
Py_Finalize)
leaked 20 objects of 360 bytes each. Switching number threads to 50 made it leak 50 such objects, etc. ASAN attributed those objecs to PyThreadState_New (called by PyGILState_Ensure).