bpe-tokenizer topic

List bpe-tokenizer repositories

Amharic-Tokenizer

96
Stars
14
Forks
96
Watchers

Syllable-aware BPE tokenizer for the Amharic language (አማርኛ) – fast, accurate, trainable.

rs-bpe

25
Stars
1
Forks
25
Watchers

A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust