nanoGPT
nanoGPT copied to clipboard
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Hi Sorry to ask, but how can I use this? How can I provide training sets?
https://huggingface.co/facebook/opt-30b Larger open source model? Does it work?
I really liked the simplicity of the globals() approach, this is one small improvement that adds argparse support, which gives a few things for free: * `python train.py -h` now...
Most of the people do not have access to 8XA100 40GB systems. But a single M1 Max laptop with 64 GB memory could host the training. How difficult is it...
I tried to replicate the code on my laptop and I ran into many obstacles! After reading the code carefully, I realized that it had many conceptual gaps. So I...
This PR updates the GPT2 lm_head weight by linking it to the token embedding weights. This is done in the official GPT2 TF implementation [here](https://github.com/openai/gpt-2/blob/master/src/model.py#L171).
Enables training with larger effective batch sizes by taking multiple steps between gradient updates. I've always found this useful since batch size correlates strongly with performance even for small models...
Hello, again Andrej, Would you mind making the training logs public so we can follow your progress in reproducing GPT2? You can do this by clicking on the lock on...
I see the documentation on the hardware requirements for training. Any thoughts on what the requirements for inference are? Thank you!