cypari2
cypari2 copied to clipboard
Memory leak for vectors and matrices
With version 2.1.2, the following causes cypari2 to allocate memory at several GB per minute:
>>> import cypari2
>>> pari = cypari2.pari_instance.Pari()
>>> while True:
... M = pari.matrix(10, 10, range(100))
Observed in Python 3.8 and 3.9 as part of SageMath 9.3-9.5 on both linux (gcc) and macOS (clang).
For what it's worth, the other cypari does not seem to have this problem.
Thanks for the report. I confirm the behaviour on my machine as well.
Interestingly, this leaks memory
v = pari.vector(10, range(10))
but this does not:
v = pari(range(10))
though they seem to result in the same PARI object.
Yes these end up with the same pari object. But these two functions allocating memory in different ways.
I'm a bit confuse as to way ref_target
is used here (for __setitem__
and in matrix
).
We create the item, put it in the cache, then increase the reference count and then set the pari coefficient. But I don't know why we need to increase the reference count. It is stored in the cache and it will be around as long as the vector/matrix is around or until the index gets set to something new. Either way it should be fine. Increasing the reference count by ref_target
seems to be exactly what creates the memory leak.
Be careful that inside ref_target
there is a call to fixGEN
that moves the GEN
outside of the pari stack (if it was still there).
Yes, instead of ref_target
one should call fixGEN
?
Ok, my previous idea was nonsense. However, the solution seems to be rather simple. The problem is that the matrix/vector is still on the stack when we do the item assignment. Assigning entries to a vector/matrix on the stack sounds like a bad idea.
This is how I figure (in sage):
sage: cython('''
....: import cypari2
....: pari = cypari2.pari_instance.Pari()
....: from cypari2.gen cimport Gen
....:
....: def foo():
....: cdef Gen M = pari.matrix(10, 10)
....: M.fixGEN()
....: M[1,2] = 20
....: ''')
The above does not leak memory. However, when you comment M.fixGEN
it leaks memory, because the matrix is still on the stack, when you assign the integer.
When I run make check
I get
================================================================================
Testing cypari2.pari_instance
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 147988456 bytes on the PARI stack
exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 238449424 bytes on the PARI stack
exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 246420800 bytes on the PARI stack
exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 251542152 bytes on the PARI stack
exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 255781968 bytes on the PARI stack
exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 197923384 bytes on the PARI stack
exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 210963184 bytes on the PARI stack
exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 215847936 bytes on the PARI stack
exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 193350080 bytes on the PARI stack
exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 178484864 bytes on the PARI stack
exec(compile(example.source, filename, "single",
================================================================================
Is this related?
I don't think this is a bug. It is not exactly a feature, but it is a consequence of something which is regarded as a feature. It is also not really a memory leak. If you watch the memory usage while this loop is running you will see it grow to about 75% of physical memory, then decay and stabilize at about 20% of physical memory. That is the python garbage collector at work.
After the last sync of cypari and cypari2, cypari2 changed their memory management scheme for Gen objects. In cypari 1.4.1 all GENs live on the pari stack. Pari has a notion of "cloning" GENs which means copying a GEN to the heap, and "uncloning" heap based GENs which means freeing them. Cypari2 makes use of this, and supports Gen objects whose GEN is either on the heap or on the stack. Intermediate values when evaluating an expression generally stay on the stack. But containers, like matrices, are moved to the heap as soon as an entry is assigned a value. And when the stack gets to be more than half full, all GENs on the stack are moved to the heap. This is a way of dealing with Pari's suggested strategy of cleaning, or "repiling" the stack during a long computation.
Essentially, cypari2 has delegated its management of GENs to the python garbage collector, at least for the heap based GENs. Since this loop involves creating lots of matrices which are immediately destroyed, it generates a very large number of heap based Gens whose GENs must be deallocated by python's garbage collector. The garbage collector allows the memory footprint to grow quite large before becoming more aggressive and taming the growth.
It is worth noting that the stack design provides much more efficient deallocation than could ever be possible with a reference counting garbage collection scheme. If you have a few million dead matrices on the stack you can deallocate all of them in one machine instruction just by setting the stack pointer before the first one. But a reference counting garbage collector will have to call free() millions of times in order to deallocate them. However, since every GEN is wrapped by a Gen object, which must be deallocated by the python garbage collector, it is not really possible for cypari(2) to take full advantage of this efficiency.
I retract my statement that this is not a bug. By adding some print statements in the Gen __dealloc__
method I was able to demonstrate that there are lots of situations in which Gens do not get deallocated, even though there should be no object which is holding a reference. This is not limited to matrices. Here is a sample run demonstrating this:
Normal Python object behavior:
class Dumb(): ... def del(self): ... print("Destroying a dumb object.") ... x = Dumb() x = Dumb() Destroying a dumb object. x = Dumb() Destroying a dumb object. x = Dumb() Destroying a dumb object.
Pari Gen object: behavior:
x = pari(1234567879012345678901234567890) Destroying a dumb object. x = pari(1234567879012345678901234567890) x = None Destroying gen on stack. Destroying gen on stack. x = pari(1234567879012345678901234567890) x = pari(1234567879012345678901234567890) x = pari(1234567879012345678901234567890) x = None Destroying gen on stack. Destroying gen on stack. Destroying gen on stack. x = pari(1234567879012345678901234567890) x = pari(1234567879012345678901234567890) x = pari(1234567879012345678901234567890) x = pari(1234567879012345678901234567890) x = None Destroying gen on stack. Destroying gen on stack. Destroying gen on stack. Destroying gen on stack. x = pari(1234567879012345678901234567890) x = pari(1234567879012345678901234567890) x = pari(1234567879012345678901234567890) x = pari(1234567879012345678901234567890) x = pari(1234567879012345678901234567890) x = None Destroying gen on stack. Destroying gen on stack. Destroying gen on stack. Destroying gen on stack. Destroying gen on stack.
For some reason (which is probably important to understand) the case of a pari Gen created from a python list is different. It behaves normally.
from cypari import pari
x = pari(range(3)) Destroying gen on stack. Destroying gen on stack. Destroying gen on stack. x = pari(range(3)) Destroying gen on stack. Destroying gen on stack. Destroying gen on stack. Destroying gen on heap. x = pari(range(3)) Destroying gen on stack. Destroying gen on stack. Destroying gen on stack. Destroying gen on heap. x = pari(range(3)) Destroying gen on stack. Destroying gen on stack. Destroying gen on stack. Destroying gen on heap.
An addendum. While the refcount behavior above looks wrong for the pari int Gens, it does not cause the same kind of memory issues to replace
M = pari.matrix(10, 10, range(100))
with
x = pari(12345678901234567890)
in the original loop.
If you think through what happens when you repeatedly call
M = pari.matrix(10, 10, range(100))
the behavior we are seeing is exactly what you would expect.
At the beginning of each iteration the bottom of the pari stack is a matrix with a refcount of 1, held by the variable M. (Actually held by the namespace dict and assigned to the name M.) When the assignment expression is evaluated two things happen, presumably atomically.
First a new matrix Gen is created and assigned to stackbottom. The matrix Gen whose GEN is immediately above the bottom now has refcount 2. One reference is held by the variable M and one by the new matrix Gen on the bottom, whose next
object is now set to the matrix immediately above the bottom.
Second, the matrix Gen on the bottom is assigned to the varible M. This sets its refcount to 1, and reduces the refcount of the matrix above from 2 to 1. The steady state is that there is a long chain of matrix Gens whose GEN is on the pari stack. Each Gen in the chain has refcount 1 held by the Gen whose GEN is directly below its GEN on the pari stack. Eventually the GENs belonging to the Gens in this chain fill the pari stack to half of its maximum size. At that point all GENS on the pari stack are moved to the heap (i.e. cloned). When a Gen in the long chain is moved to the heap its refcount is reduced to 0, so it gets destroyed immediately after it is moved to the heap.
The docstring for pari_instance.stacksizemax says: Return the maximum size of the PARI stack, which is determined at startup in terms of available memory.
If that is correct it also explains why memory grows under this loop to a (large) fixed fraction of total memory but never reaches OOM.
It is not clear to me how this should be fixed. One option might be to always create new Gens on the heap instead of on the stack. However, pari will always create a new Gen on the pari stack. So this would mean immediately cloning the GEN when a new Gen is created. That costs the time required for the copy. I assume that the tactic of leaving GENs on the stack as long as possible was intended to increase performance by avoiding these copies. That performance increase would be lost. On the other hand, in this particular context, the copy is happening anyway. It just gets delayed until the stack is half full.
Another option might be to do away with the heap altogether. That is probably not a good idea.
If what I said above is correct, it is definitely not the whole story since it does not account for the fact that many Gens are moved to the heap immediately. Also, changing the threshold for moving all Gens to the heap has no effect on the memory usage.
I can now finish the story. The discussion above explains why the loop
>>> while True:
... M = pari.matrix(10, 10, range(100))
generates many matrices on the pari stack. The other thing to explain is why having lots of matrices on the stack uses huge amounts of memory. The reason is that the pari.matrix
method creates Gens for all of the matrix entries and puts them on the heap. It also caches the entries in its itemcache dict. That means that the entries can not be destroyed until the matrix is destroyed. But a stack-based Gen, other than stackbottom, cannot be destroyed because it is referenced as Gen.next by some other Gen on the stack. So the matrices and their entries persist until the stack gets so large that the stack-based Gens get moved to the heap. Note also that there are, in this case, 100 times as many entries as matrices.
Oh, and the fix is to call A.fixGen() as soon as a matrix A is created.