pgi
pgi copied to clipboard
Memory leak
This demo has a memory usage which scales with N
:
from pgi.repository import Poppler
doc = Poppler.Document.new_from_file(url, '')
N = 100000
for i in xrange(N):
if i % 10000 == 0:
print len(gc.get_objects())
p = doc.get_page(0)
Calling g_free
on p._obj
causes a double free, so the problem is python-side. The number of alive objects grows by 4 per iteration.
Two problems:
- GIBaseInfo instances leak (probably a cycle and they define del)
- The gtype->python class lookup doesn't get cached, creating a GIBaseInfo each call
Is this something you're attacking or shall I have a go?
Go ahead.
I'd guess using weakrefs instead of __del__
should fix it.
Like cffi's "gc(cdata, destructor)" https://bitbucket.org/cffi/cffi/src/af4e381b5e99c27c466377145a84eeece6e5c199/cffi/gc_weakref.py?at=default
I gave you commit rights btw, so you should be able to push directly.
It also appears that .unref()
isn't being called in addition. Not sure if this is a bug introduced with #10. Trying to understand why.
I'm looking at the unpack_return
code for Object
.
It calls object.__new__(Poppler.Page)
and sets its _ref, but I don't see any sign of garbage tracking.
Should we add an UnrefFinalizer.track()
on the resulting object?
@lazka -- this is the sort of thing I've done which seems to do the right thing.
However, I get the impression that you intended for this to already work so I don't know if my solution is in the spirit of your other code. I've not made a pull request for the linked commit yet, it perhaps belongs on top of #10 if you are happy with the approach.
This is the code for track_and_unref
which is called by the code in the above link.
Tidying up my personal issues list, so closing this. Please create a new issue if you're still interested in tracking it.
I'm still hitting this problem. Not sure what a clean solution is, advice welcomed!
This demonstrates the problem:
from pgi.repository import Poppler as poppler
doc = poppler.Document.new_from_file("file://test.pdf", "")
for i in range(doc.get_n_pages()):
p = doc.get_page(i)
# p.unref()
# doc.unref()
If I call .unref()
, the problem goes away.
So I'd like to determine if the unref()
can be automated, or if it supposed to be automatic why it currently isn't.
Ping. I'd like to close this (preferably with a resolution), any advice?
Observing same issue while using Gsf, it is leading to file descriptor leak -
Python 3.8.7 (default, Dec 21 2020, 21:23:03)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pgi
>>> pgi.install_as_gi()
>>> from gi.repository import Gsf
<stdin>:1: PyGIWarning: Gsf was imported without specifying a version first. Use gi.require_version('Gsf', '1') before import to ensure that the right version gets loaded.
>>> _gsf_inputstdio = Gsf.InputStdio.new("test_file")
>>> _gsf_infilemsole = Gsf.InfileMSOle.new(_gsf_inputstdio)
>>>
>>> _gsf_infilemsole.unref()
>>> _gsf_inputstdio.unref()
Calling unref releases the file descriptor. (Deleting the objects doesn't help)