tokenizer topic
friso
High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other...
json
🔋 In-place lightweight JSON parser
cang-jie
Chinese tokenizer for tantivy, based on jieba-rs
rustfst
Rust re-implementation of OpenFST - library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). A Python binding is also available.
tokenizer
A small library for converting tokenized PHP source code into XML (and potentially other formats)
simplemma
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
SNL-Compiler
SNL(Small Nested Language) Compiler. Maven jUnit Tokenizer Lexer Syntax Parser. 编译原理 词法分析 语法分析