bpe topic

List bpe repositories

SubwordEncoding-CWS

55
Stars
12
Forks
Watchers

Subword Encoding in Lattice LSTM for Chinese Word Segmentation

subword-nmt

2.2k
Stars
463
Forks
Watchers

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation

Tokenizer

268
Stars
66
Forks
Watchers

Fast and customizable text tokenization library with BPE and SentencePiece support

YouTokenToMe

945
Stars
95
Forks
Watchers

Unsupervised text tokenizer focused on computational efficiency

nlp_made_easy

247
Stars
33
Forks
Watchers

Explains nlp building blocks in a simple manner.

python-bpe

220
Stars
38
Forks
Watchers

Byte Pair Encoding for Python!

piecelearn

19
Stars
1
Forks
Watchers

Learning BPE embeddings by first learning a segmentation model and then training word2vec

gpt-tokenizer

394
Stars
31
Forks
Watchers

JavaScript BPE Tokenizer Encoder Decoder for OpenAI's GPT-2 / GPT-3 / GPT-4. Port of OpenAI's tiktoken with additional features.

tiktoken-rs

209
Stars
39
Forks
Watchers

Ready-made tokenizer library for working with GPT and tiktoken