tokenizer topic

List tokenizer repositories

Chiffon

54
Stars
4
Forks
Watchers

A small ECMAScript parser, tokenizer and minifier written in JavaScript.

thot

50
Stars
12
Forks
Watchers

Thot toolkit for statistical machine translation

spacy-experimental

94
Stars
18
Forks
Watchers

🧪 Cutting-edge experimental spaCy components and features

wink-tokenizer

59
Stars
12
Forks
Watchers

Multilingual tokenizer that automatically tags each token with its type

python-vncorenlp

54
Stars
17
Forks
Watchers

A Python wrapper for VnCoreNLP using a bidirectional communication channel.

DumbLuaParser

29
Stars
3
Forks
Watchers

Lua parsing library capable of optimizing and minifying code.

JPOPHP

26
Stars
6
Forks
Watchers

JSON Parser Object PHP is a library for parsing the data in JSON format.

greeb

15
Stars
7
Forks
Watchers

Greeb is a simple Unicode-aware regexp-based tokenizer.

tokenizer

145
Stars
18
Forks
Watchers

NLP tokenizers written in Go language