tokenizer topic
jumanpp
Juman++ (a Morphological Analyzer Toolkit)
udpipe
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Wordless
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
moo
Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
jflex
The fast scanner generator for Java™ with full Unicode support
query-translator
Query Translator is a search query translator with AST representation
ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashta...
soynlp
한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.
Mustard
🌭 Mustard is a Swift library for tokenizing strings when splitting by whitespace doesn't cut it.
SmoothNLP
专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference