cjk-tokenizer topic

List cjk-tokenizer repositories

friso

472
Stars
94
Forks
Watchers

High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other...