word-segmentation topic
lac
百度NLP:分词,词性标注,命名实体识别,词重要性
ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashta...
SymSpell
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
symspellpy
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
YouTokenToMe
Unsupervised text tokenizer focused on computational efficiency
pythainlp
Thai Natural Language Processing in Python.
ckip-transformers
CKIP Transformers
toiro
A comparison tool of Japanese tokenizers