pgi icon indicating copy to clipboard operation
pgi copied to clipboard

Memory leak

Open pwaller opened this issue 10 years ago • 11 comments

This demo has a memory usage which scales with N:

from pgi.repository import Poppler
doc = Poppler.Document.new_from_file(url, '')
N = 100000
for i in xrange(N):
    if i % 10000 == 0:
        print len(gc.get_objects())
    p = doc.get_page(0)

Calling g_free on p._obj causes a double free, so the problem is python-side. The number of alive objects grows by 4 per iteration.

pwaller avatar Jul 25 '14 09:07 pwaller

Two problems:

  • GIBaseInfo instances leak (probably a cycle and they define del)
  • The gtype->python class lookup doesn't get cached, creating a GIBaseInfo each call

lazka avatar Jul 25 '14 12:07 lazka

Is this something you're attacking or shall I have a go?

pwaller avatar Jul 25 '14 12:07 pwaller

Go ahead.

I'd guess using weakrefs instead of __del__ should fix it.

Like cffi's "gc(cdata, destructor)" https://bitbucket.org/cffi/cffi/src/af4e381b5e99c27c466377145a84eeece6e5c199/cffi/gc_weakref.py?at=default

I gave you commit rights btw, so you should be able to push directly.

lazka avatar Jul 25 '14 13:07 lazka

It also appears that .unref() isn't being called in addition. Not sure if this is a bug introduced with #10. Trying to understand why.

pwaller avatar Jul 28 '14 13:07 pwaller

I'm looking at the unpack_return code for Object.

It calls object.__new__(Poppler.Page) and sets its _ref, but I don't see any sign of garbage tracking.

Should we add an UnrefFinalizer.track() on the resulting object?

pwaller avatar Jul 28 '14 13:07 pwaller

@lazka -- this is the sort of thing I've done which seems to do the right thing.

However, I get the impression that you intended for this to already work so I don't know if my solution is in the spirit of your other code. I've not made a pull request for the linked commit yet, it perhaps belongs on top of #10 if you are happy with the approach.

pwaller avatar Jul 28 '14 14:07 pwaller

This is the code for track_and_unref which is called by the code in the above link.

pwaller avatar Jul 28 '14 14:07 pwaller

Tidying up my personal issues list, so closing this. Please create a new issue if you're still interested in tracking it.

pwaller avatar Mar 21 '15 12:03 pwaller

I'm still hitting this problem. Not sure what a clean solution is, advice welcomed!

This demonstrates the problem:

from pgi.repository import Poppler as poppler
doc = poppler.Document.new_from_file("file://test.pdf", "")
for i in range(doc.get_n_pages()):
    p = doc.get_page(i)
    # p.unref()
# doc.unref()

If I call .unref(), the problem goes away.

So I'd like to determine if the unref() can be automated, or if it supposed to be automatic why it currently isn't.

pwaller avatar May 26 '15 08:05 pwaller

Ping. I'd like to close this (preferably with a resolution), any advice?

pwaller avatar May 30 '16 16:05 pwaller

Observing same issue while using Gsf, it is leading to file descriptor leak -

Python 3.8.7 (default, Dec 21 2020, 21:23:03)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pgi
>>> pgi.install_as_gi()
>>> from gi.repository import Gsf
<stdin>:1: PyGIWarning: Gsf was imported without specifying a version first. Use gi.require_version('Gsf', '1') before import to ensure that the right version gets loaded.
>>> _gsf_inputstdio = Gsf.InputStdio.new("test_file")
>>> _gsf_infilemsole = Gsf.InfileMSOle.new(_gsf_inputstdio)
>>>
>>> _gsf_infilemsole.unref()
>>> _gsf_inputstdio.unref()

Calling unref releases the file descriptor. (Deleting the objects doesn't help)

AmitANetskope avatar Oct 06 '21 09:10 AmitANetskope