pyahocorasick icon indicating copy to clipboard operation
pyahocorasick copied to clipboard

memory leak in 1.4.0

Open WojciechMula opened this issue 3 years ago • 6 comments

Hi, i'm sorry for opening such an old issue, but i'm currently experiencing the same issue. I'm using version 1.4.0 now and getting small steady memory leaks (after debugging with tracemalloc) on:

A = ahocorasick.Automaton() MyList = [...] for x in MyList: A.add_word(y, (y, z))

is there a chance this bug has returned

Thanks, Eden.

Originally posted by @EdenAzulay in https://github.com/WojciechMula/pyahocorasick/issues/81#issuecomment-705534956

WojciechMula avatar Dec 23 '20 11:12 WojciechMula

I think that I'm also experiencing the same memory leak on add_word. Would love to see any updates :)

Edit - I was experiencing a different memory leak. My leak originated from using multiprocessing Pool and some issue with passing ahocorasick automaton between workers, I think there's some issue with serialization causing old objects not to be cleaned.

AlonSh avatar Mar 04 '21 12:03 AlonSh

@AlonSh could you please provide some minimal example?

WojciechMula avatar Mar 10 '21 20:03 WojciechMula

Yeah: create some automaton create a multiprocessing Pool and do:

pool.apply_async(
            run_automaton,
            (automaton, text),
            callback=callback_success,
            error_callback=_my_error_callback,
        )

and you'll see your memory exploding after some calls.

AlonSh avatar Mar 11 '21 07:03 AlonSh

Great! Thank you.

WojciechMula avatar Mar 17 '21 20:03 WojciechMula

I am pushing tests to run on the CI on many Linux ... but while I can have it fail locally on Ubuntu 16... the tests seem to pass on more recent linux. I wonder if this is not dependent on a certain version of the compiler? Otherwise, this is a head scratcher. @AlonSh FWIW, I recycle processes after a 1000 calls in my pools to cope with leaks. Not perfect, but a workaround at least. See for instance https://github.com/nexB/scancode-toolkit/blob/e080f8354bed5813df9b619efe575ce9931a5a5b/src/scancode/cli.py#L1209

pombredanne avatar Mar 06 '22 22:03 pombredanne

Hello guys,

Is there any update on this issue? I've tested a library version 2.0.0 today and memory consumption added up every time automaton was used in ProcessPoolExecutor futures. We had to stop the service after RAM consumption crossed 140GB. I attempted to build an image FROM ubuntu:20.04, python:3.8, python:3.10. The issue is reproduced every time. The latest usable lib version for us remains 1.1.8. Please let me know if there is any troubleshooting info I could provide for the research.

Azzonith avatar Jan 24 '23 18:01 Azzonith