daac-tools
daac-tools
daachorse
🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure in Rust.
vaporetto
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
vibrato
🎤 vibrato: Viterbi-based accelerated tokenizer
crawdad
🦞 Rust library of natural language dictionaries using character-wise double-array tries.
python-vaporetto
🛥 Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.
python-vibrato
Viterbi-based accelerated tokenizer (Python wrapper)
trie-match
Fast match expression optimized for string comparison
find-simdoc
Finding all pairs of similar documents time- and memory-efficiently