nanoGPT
nanoGPT copied to clipboard
High Loss Value When Training NanoGPT on a Single Small GPU
Hello,
I'm working with the NanoGPT train.py script, following the "reproducing GPT-2" instructions, aiming to replicate the GPT-2 model's training using OpenWebText data. Unlike the original model, which was trained on 4x A100 GPUs, I used a single, small GPU. Initially, I started with a batch size of 12, but due to the GPU running out of memory, I reduced it to 3. The only change i made to train.py being this
batch_size = 3 # if gradient_accumulation_steps > 1, this is the micro-batch size
When calling python train.py the training loss plateaued at 7.5. To be specific it reached this loss after 2 days stayed at that level until I cancelled the run after 4 days, much higher than the expected 2.8. This leads me to a few questions:
- Is the high loss primarily due to the limited capacity of my smaller GPU?
- Does the reduction in batch size impact the learning efficacy of the model, beyond just slowing down the training?
Any advice on training NanoGPT effectively on limited hardware would be greatly appreciated, along with suggestions for any configuration adjustments.
Thank you for your time and insights.
What GPU? Also are you really the 124M model? that wouldn't train on a "single small GPU"
n_layer = 12
n_head = 12
n_embd = 768
Yeah that absolutely is going to be the reason, appreciate the response!