learned-tokenization topic

List learned-tokenization repositories

MEGABYTE-pytorch

620
Stars
52
Forks
Watchers

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

rvq-vae-gpt

77
Stars
1
Forks
Watchers

My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation