tokenizer topic

List tokenizer repositories

sacremoses

479
Stars
59
Forks
Watchers

Python port of Moses tokenizer, truecaser and normalizer

lexmachine

402
Stars
31
Forks
Watchers

Lex machinary for go.

sentences

424
Stars
38
Forks
Watchers

A multilingual command line sentence tokenizer in Golang

js-tokens

481
Stars
30
Forks
Watchers

Tiny JavaScript tokenizer.

fugashi

372
Stars
31
Forks
Watchers

A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.

vscode-blockman

344
Stars
16
Forks
Watchers

VSCode extension to highlight nested code blocks

bitextor

287
Stars
43
Forks
Watchers

Bitextor generates translation memories from multilingual websites

Tokenizer

268
Stars
66
Forks
Watchers

Fast and customizable text tokenization library with BPE and SentencePiece support

lindera

359
Stars
36
Forks
Watchers

A multilingual morphological analysis library.

simple

503
Stars
71
Forks
Watchers

支持中文和拼音的 SQLite fts5 全文搜索扩展 | A SQLite3 fts5 tokenizer which supports Chinese and PinYin