pyahocorasick
pyahocorasick copied to clipboard
Python module (C extension and plain python) implementing Aho-Corasick algorithm
Hi, I did a bulk of matches with 'iter_long', and most of the time it worked well. However when the overlapped pattern doesn't match, the inclusive substring won't be able...
Attempt to fix #133
BREAKING CHANGE: This PR changes the signature and behavior of the `keys`, `values`, and `items` methods of the `Automaton` class to process the optional `pattern` parameter as a Unix shell...
Few things I consider for the next, incompatible version (in random order): 1. Remove value_type from Automaton constructor (i.e. ``STORE_ANY``, ``STORE_INTS``, ``STORE_LENGTH``). Using ``STORE_{INT,LENGTH}`` makes sense at C-language level, but...
The second matching `(5, 'her' )` and the last one `(14, 'she')` are not aliging the word boundary, how to remove them ? or could we force them to mathcing...
The output will have overlapping betweet differnt phrases. How to solve the problem of overlapping? Is there any advice? As shown the example bellow, I want to the results output:...
Version: 1.1.7 Python 3.6 Hi, I'm seeing a pretty drastic memory leak using `A.keys(...)` only. Is this expected? I'm doing the following: ``` A = ahocorasick.Automaton() for doc in my_data:...
The library uses "ahocorasick" as its name