learned-tokenization topic

List learned-tokenization repositories

MEGABYTE-pytorch

592
Stars
49
Forks
Watchers

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

rvq-vae-gpt

73
Stars
1
Forks
Watchers

My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation