word-segmentation topic

List word-segmentation repositories
trafficstars

lac

3.8k
Stars
588
Forks
Watchers

百度NLP:分词,词性标注,命名实体识别,词重要性

ekphrasis

660
Stars
92
Forks
Watchers

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashta...

SymSpell

3.1k
Stars
281
Forks
Watchers

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

symspellpy

772
Stars
116
Forks
Watchers

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

sentencepiece

9.7k
Stars
1.1k
Forks
Watchers

Unsupervised text tokenizer for Neural Network-based text generation.

YouTokenToMe

945
Stars
95
Forks
Watchers

Unsupervised text tokenizer focused on computational efficiency

pythainlp

933
Stars
271
Forks
Watchers

Thai Natural Language Processing in Python.

toiro

114
Stars
8
Forks
Watchers

A comparison tool of Japanese tokenizers

CWS

80
Stars
26
Forks
Watchers

Source code for an ACL2016 paper of Chinese word segmentation