auto-phrase-tokenfilter
auto-phrase-tokenfilter copied to clipboard
AutophrasingTokenfilter leaks memory
If I understand the code correctly, phraseMap should be read-only. However, it gets altered because references to the phrase lists are leaked into currentPhrases, to which other phrases are added. Not only does this leak memory, but I wouldn't be surprised if this causes actual bugs with recognizing the phrases (false positives). To fix this, the phraseMap.get calls need to be wrapped into CharArraySet.copy. I filed a pull request in the fork from emergecds, but the same changes apply here.
@kutschkem : looking quickly at the code, it seems like a leak, did you encounter any problems after wrapping it with CharArraySet.copy?
@kaismh No, I didn't encounter functional problems before or after the fix. I don't understand the code well enough to be 100% sure that I didn't overlook anything, though.
@kutschkem : Many thanks, I am using your fix, and didn't not encounter any issues so far, I will update you if problems were encountered
The relevant code is still unchanged in this repository. Is this repo dead?