Daniel Pressel
Results
13
issues of
Daniel Pressel
Add support for [OPT](https://arxiv.org/pdf/2205.01068.pdf). It is: - a decoder-only model with learned-positional embeddings up to 2k - same checkpoint structure as BART without encoder - GPT2 byte-level tokenizer with a...
Switches the basic dictionary things to OmegaConf and makes the main driver Hydra-style.
Just a tiny change so that it runs on python 3.