aho-corasick icon indicating copy to clipboard operation
aho-corasick copied to clipboard

High memory usage compared to other implementation

Open jonrscott opened this issue 3 years ago • 2 comments

Hi, I found this while looking for a lower memory usage alternative to anknown/ahocorasick.

I have a dataset of around 6 million strings. The total memory usage, as shown by pprof, after building the automaton is just over 30GB, compared to 6.5GB for the anknown version.

Do you have any tips for working out why it's using so much more RAM?

Thanks in advance.

jonrscott avatar Dec 09 '21 14:12 jonrscott

Hi, I found this while looking for a lower memory usage alternative to anknown/ahocorasick.

I have a dataset of around 6 million strings. The total memory usage, as shown by pprof, after building the automaton is just over 30GB, compared to 6.5GB for the anknown version.

Do you have any tips for working out why it's using so much more RAM?

Thanks in advance.

Hey, sorry for the late response, lol. It's been 2 years, more or less. I haven't had much time for open source. I am not familiar with the implementation of anknown. I'll need to check it out before making some kind of a statement.

petar-dambovaliev avatar Jul 25 '23 21:07 petar-dambovaliev

Hi, I found this while looking for a lower memory usage alternative to anknown/ahocorasick.

I have a dataset of around 6 million strings. The total memory usage, as shown by pprof, after building the automaton is just over 30GB, compared to 6.5GB for the anknown version.

Do you have any tips for working out why it's using so much more RAM?

Thanks in advance.

I will also need to analyse your data.

peter7891 avatar Jul 29 '23 15:07 peter7891