tokenizer topic
tokenizers
Fast, Consistent Tokenization of Natural Language Text
vibrato
🎤 vibrato: Viterbi-based accelerated tokenizer
megamark
:heart_eyes_cat: Markdown with easy tokenization, a fast highlighter, and a lean HTML sanitizer
chinese-tokenizer
Tokenizes Chinese texts into words.
neural_tokenizer
Tokenize English sentences using neural networks.
string-calc
PHP calculator library for mathematical terms (expressions) passed as strings
EBNFParser
Convenient parser generator for Python(check out https://github.com/thautwarm/RBNF for an advanced version).
GreynirServer
The greynir.is Icelandic natural language processing API and website.
hunspell
High-Performance Stemmer, Tokenizer, and Spell Checker for R
kortok
The code and models for "An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks" (AACL-IJCNLP 2020)