nanoGPT
nanoGPT copied to clipboard
Training gpt2 on a single GPU
I have an RTX 4090. How long would it take to pre-train gpt2 by only running python train.py
That's really Vague, You're going to have to give way more information than that, like Dataset size, and the gpt2 model size you want to pretrain.
The estimate from LLama-2-70b is found here
For some real life specs:
- When karpathy Trained the model on openweb text for gpt2 124 mil, on 8xA100 ,it took him 96 hours.
- When karpathy Trained the model on tiny_shakespeare, it took him 3 minutes
- From my fork of this project, NanoChatGPT, I finetune on a dataset of size 270mb in size, and it takes half an hour for 50 iters
@karpathy any better specs?
Default arguments in the train.py file. The dataset: openweb text.
As I said before, When karpathy Trained the model on openweb text for gpt2 124 mil, on 8xA100 ,it took him 96 hours. Thats the default values, and openwebtext.
@VatsaDev I'm thinking about modify some params to reduce the training time (from scrarch). Which param do you suggest me to decrease so that I can still have a decent model?
@hmbui-noze, for a decent model I would always recommend finetune, but at my repo nanoChatGPT, I have the hyperparams for finetune, and these take ~26min
eval_interval = 5
eval_iters = 40
always_save_checkpoint = False
batch_size = 4
gradient_accumulation_steps = 32
learning_rate = 2e-5
decay_lr = False
I have gpt2-medium, use gpt2-xl if you can.
If you have a lot of ram I would just increase batch size instead of decreasing anything else to (your ram)/2
Using a big batch size can speed up training by a lot. if you have a very small dataset, increase the learning rate and decrease eval iters.