tokenizer topic

List tokenizer repositories

jumanpp

369
Stars
44
Forks
Watchers

Juman++ (a Morphological Analyzer Toolkit)

udpipe

209
Stars
34
Forks
Watchers

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit

Wordless

673
Stars
88
Forks
Watchers

An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation

moo

807
Stars
64
Forks
Watchers

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.

jflex

576
Stars
111
Forks
Watchers

The fast scanner generator for Java™ with full Unicode support

query-translator

199
Stars
10
Forks
Watchers

Query Translator is a search query translator with AST representation

ekphrasis

660
Stars
92
Forks
Watchers

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashta...

soynlp

907
Stars
181
Forks
Watchers

한국어 자연어처리를 위한 파이썬 라이브러리입니다. 단어 추출/ 토크나이저 / 품사판별/ 전처리의 기능을 제공합니다.

Mustard

689
Stars
18
Forks
Watchers

🌭 Mustard is a Swift library for tokenizing strings when splitting by whitespace doesn't cut it.

SmoothNLP

619
Stars
113
Forks
Watchers

专注于可解释的NLP技术 An NLP Toolset With A Focus on Explainable Inference