nanoGPT Training gpt2 on a single GPU

I have an RTX 4090. How long would it take to pre-train gpt2 by only running python train.py

Aug 21 '23 18:08 TahaBinhuraib

That's really Vague, You're going to have to give way more information than that, like Dataset size, and the gpt2 model size you want to pretrain.

The estimate from LLama-2-70b is found here

For some real life specs:

When karpathy Trained the model on openweb text for gpt2 124 mil, on 8xA100 ,it took him 96 hours.
When karpathy Trained the model on tiny_shakespeare, it took him 3 minutes
From my fork of this project, NanoChatGPT, I finetune on a dataset of size 270mb in size, and it takes half an hour for 50 iters

@karpathy any better specs?

Aug 21 '23 21:08 VatsaDev

Default arguments in the train.py file. The dataset: openweb text.

Aug 21 '23 21:08 TahaBinhuraib

As I said before, When karpathy Trained the model on openweb text for gpt2 124 mil, on 8xA100 ,it took him 96 hours. Thats the default values, and openwebtext.

Aug 22 '23 12:08 VatsaDev

@VatsaDev I'm thinking about modify some params to reduce the training time (from scrarch). Which param do you suggest me to decrease so that I can still have a decent model?

Aug 22 '23 15:08 hmbui-noze

@hmbui-noze, for a decent model I would always recommend finetune, but at my repo nanoChatGPT, I have the hyperparams for finetune, and these take ~26min

eval_interval = 5
eval_iters = 40
always_save_checkpoint = False
batch_size = 4
gradient_accumulation_steps = 32
learning_rate = 2e-5
decay_lr = False

I have gpt2-medium, use gpt2-xl if you can. If you have a lot of ram I would just increase batch size instead of decreasing anything else to (your ram)/2 Using a big batch size can speed up training by a lot. if you have a very small dataset, increase the learning rate and decrease eval iters.

Aug 23 '23 13:08 VatsaDev

nanoGPT nanoGPT copied to clipboard

Training gpt2 on a single GPU

nanoGPT
nanoGPT copied to clipboard