cjk-tokenizer topic
List
cjk-tokenizer repositories
friso
472
Stars
94
Forks
Watchers
High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other...