auto-phrase-tokenfilter icon indicating copy to clipboard operation
auto-phrase-tokenfilter copied to clipboard

AutophrasingTokenfilter leaks memory

Open kutschkem opened this issue 9 years ago • 4 comments

If I understand the code correctly, phraseMap should be read-only. However, it gets altered because references to the phrase lists are leaked into currentPhrases, to which other phrases are added. Not only does this leak memory, but I wouldn't be surprised if this causes actual bugs with recognizing the phrases (false positives). To fix this, the phraseMap.get calls need to be wrapped into CharArraySet.copy. I filed a pull request in the fork from emergecds, but the same changes apply here.

kutschkem avatar Apr 20 '15 08:04 kutschkem

@kutschkem : looking quickly at the code, it seems like a leak, did you encounter any problems after wrapping it with CharArraySet.copy?

kaismh avatar May 31 '15 08:05 kaismh

@kaismh No, I didn't encounter functional problems before or after the fix. I don't understand the code well enough to be 100% sure that I didn't overlook anything, though.

kutschkem avatar Jun 01 '15 08:06 kutschkem

@kutschkem : Many thanks, I am using your fix, and didn't not encounter any issues so far, I will update you if problems were encountered

kaismh avatar Jun 01 '15 09:06 kaismh

The relevant code is still unchanged in this repository. Is this repo dead?

kutschkem avatar Sep 28 '18 20:09 kutschkem