chinese-tokenizer topic

List chinese-tokenizer repositories

friso

472
Stars
94
Forks
Watchers

High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other...

MicroTokenizer

143
Stars
22
Forks
Watchers

一个微型&算法全面的中文分词引擎 | A micro tokenizer for Chinese

Chinese_tokenizer_benchmark

23
Stars
5
Forks
Watchers

中文分词软件基准测试 | Chinese tokenizer benchmark

PaddleTokenizer

15
Stars
2
Forks
Watchers

使用 PaddlePaddle 实现基于深度神经网络的中文分词引擎 | A DNN Chinese Tokenizer by Using PaddlePaddle

HanziNLP

17
Stars
1
Forks
Watchers

A NLP package for Chinese text:Preprocessing, Tokenization, Chinese Fonts, Word Embeddings, Text Similarity and Sentiment Analysis 轻量级中文自然语言处理软件包