pyahocorasick
pyahocorasick copied to clipboard
Python module (C extension and plain python) implementing Aho-Corasick algorithm
Version: 1.4 Python 2.7.15 class TEST(): def __init__(self, input_filename): self.ac = ahocorasick.Automaton() n_word = 0 with open(input_filename) as f: for text in f: n_word += 1 word = text.strip() self.ac.add_word(word,...
demo code: ____________________________________________________________________________________________________ import os import psutil import ahocorasick def build_automaton(): automation = ahocorasick.Automaton() for i in range(2000000): automation.exists(str(i)) def show_used_memory(): print('memory used: {} M'.format(psutil.Process(os.getpid()).memory_info().rss / (1024. ** 2))) if...
``` gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DAHOCORASICK_UNICODE= -I/home/pombreda/pyahocorasick/venv/include -I/home/pombreda/.pyenv/versions/3.6.10/include/python3.6m -c pyahocorasick.c -o build/temp.linux-x86_64-3.6/pyahocorasick.o In file included from Automaton.c:1201:0, from pyahocorasick.c:29: Automaton_pickle.c: In function ‘automaton_unpickle’: Automaton_pickle.c:363:17:...
Travis supported this. It is not clear if GH actions do support this
Hi, thanks for the great work! I am wondering if case-insensitive string match is supported. For example, when there is "information system" in the built Trie, and it can match...
Hello! Thank you for the great library. I need to search multiple keys stored in the automaton in the input string. The number of keys is big and they overlap...
Hi, I'm just starting to discover and use your library to extract terms defined in a thesaurus from an input text, and "highlight" them in a HTML output. It works...
An attempt to solve #102. What was done: instead of creating pickle data before pickling, we create an iterator which was meant to yield small portions of data on demand....
Hi, is there a way to determine prepared Automaton's memory footprint? It could be helpful for using this in limited size cache. Thank you.