chinese-word-segmentation topic
friso
High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other...
jcseg
Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction imp...
cjieba-py
Python cffi binding to CppJieba
g2pC
g2pC: A Context-aware Grapheme-to-Phoneme Conversion module for Chinese
WordSeg
A PyTorch implementation of a BiLSTM \ BERT \ Roberta (+ BiLSTM + CRF) model for Chinese Word Segmentation (中文分词) .
DeepLearning_NLP
基于深度学习的自然语言处理库
MicroTokenizer
一个微型&算法全面的中文分词引擎 | A micro tokenizer for Chinese
Chinese-Word-Vectors
100+ Chinese Word Vectors 上百种预训练中文词向量
monpa
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Jiagu
Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类