pyahocorasick icon indicating copy to clipboard operation
pyahocorasick copied to clipboard

memory leak

Open richardhundt opened this issue 6 years ago • 8 comments

Version: 1.1.7 Python 3.6

Hi, I'm seeing a pretty drastic memory leak using A.keys(...) only. Is this expected?

I'm doing the following:

A = ahocorasick.Automaton()
for doc in my_data:
    A.add_word(doc['my_field'], doc)

for s in my_strings:
    keys = list(A.keys(s))
    print(keys)

That's it. My data sets are pretty large, but I obviously know when I'm in the second loop, and memory keeps growing without bound.

richardhundt avatar Apr 21 '18 18:04 richardhundt

@richardhundt Thank you for reporting this.

WojciechMula avatar Apr 21 '18 21:04 WojciechMula

@richardhundt Again, thank you very much for the report and sorry for inconvenience. Hopefully, I fixed the leak (TBH, embarrassing bug). If you like, you may try the code directly from the master. Or wait until tomorrow (well, today's evening), I'll prepare a new release.

WojciechMula avatar Apr 24 '18 22:04 WojciechMula

Better wait for the new release, a build failed.

WojciechMula avatar Apr 24 '18 22:04 WojciechMula

OK, version 1.1.8 was released. @richardhundt please verify, if you can.

WojciechMula avatar Apr 25 '18 18:04 WojciechMula

works perfectly now, thank you!

richardhundt avatar Apr 26 '18 08:04 richardhundt

Hi, i'm sorry for opening such an old issue, but i'm currently experiencing the same issue. I'm using version 1.4.0 now and getting small steady memory leaks (after debugging with tracemalloc) on:

A = ahocorasick.Automaton() MyList = [...] for x in MyList: A.add_word(y, (y, z))

is there a chance this bug has returned

Thanks, Eden.

EdenAzulay avatar Oct 08 '20 12:10 EdenAzulay

See https://github.com/WojciechMula/pyahocorasick/pull/166 where we have still a memory leak on unicode builds

pombredanne avatar Apr 27 '22 14:04 pombredanne

Hello,

Can confirm memory leak issue exists in 1.4.4 and it might have something to do with pickling, tracemalloc output:

[ Top 10 ] /usr/lib/python3.8/multiprocessing/reduction.py:51: size=23.2 GiB, count=36957, average=659 KiB /usr/lib/python3.8/linecache.py:137: size=508 KiB, count=5133, average=101 B /usr/lib/python3.8/tracemalloc.py:65: size=60.1 KiB, count=962, average=64 B /usr/lib/python3.8/tracemalloc.py:185: size=42.2 KiB, count=900, average=48 B :1: size=37.7 KiB, count=444, average=87 B :640: size=29.2 KiB, count=388, average=77 B /usr/local/lib/python3.8/dist-packages/kafka/protocol/types.py:193: size=28.4 KiB, count=638, average=46 B /usr/local/lib/python3.8/dist-packages/kafka/metrics/stats/sampled_stat.py:89: size=20.6 KiB, count=416, average=51 B /usr/lib/python3.8/copy.py:76: size=19.2 KiB, count=123, average=160 B /usr/local/lib/python3.8/dist-packages/kafka/cluster.py:281: size=18.4 KiB, count=100, average=188 B

When I roll back to 1.1.8 the problem is not reproduced again.

Azzonith avatar Jun 16 '22 15:06 Azzonith

@Azzonith I created https://github.com/WojciechMula/pyahocorasick/issues/183 to track your issue

pombredanne avatar Jan 14 '23 15:01 pombredanne

@EdenAzulay You issue is tracked in https://github.com/WojciechMula/pyahocorasick/issues/135

Closing this one.

pombredanne avatar Jan 14 '23 15:01 pombredanne