Daniel Pressel

Results 13 issues of Daniel Pressel

Add support for [OPT](https://arxiv.org/pdf/2205.01068.pdf). It is: - a decoder-only model with learned-positional embeddings up to 2k - same checkpoint structure as BART without encoder - GPT2 byte-level tokenizer with a...

Switches the basic dictionary things to OmegaConf and makes the main driver Hydra-style.

Just a tiny change so that it runs on python 3.