tokenizer topic

List tokenizer repositories

tokenizers

184
Stars
25
Forks
Watchers

Fast, Consistent Tokenization of Natural Language Text

vibrato

303
Stars
14
Forks
Watchers

🎤 vibrato: Viterbi-based accelerated tokenizer

megamark

103
Stars
9
Forks
Watchers

:heart_eyes_cat: Markdown with easy tokenization, a fast highlighter, and a lean HTML sanitizer

chinese-tokenizer

91
Stars
24
Forks
Watchers

Tokenizes Chinese texts into words.

neural_tokenizer

64
Stars
9
Forks
Watchers

Tokenize English sentences using neural networks.

string-calc

98
Stars
17
Forks
Watchers

PHP calculator library for mathematical terms (expressions) passed as strings

EBNFParser

64
Stars
6
Forks
Watchers

Convenient parser generator for Python(check out https://github.com/thautwarm/RBNF for an advanced version).

GreynirServer

65
Stars
17
Forks
Watchers

The greynir.is Icelandic natural language processing API and website.

hunspell

107
Stars
44
Forks
Watchers

High-Performance Stemmer, Tokenizer, and Spell Checker for R

kortok

114
Stars
11
Forks
Watchers

The code and models for "An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks" (AACL-IJCNLP 2020)