nanoGPT icon indicating copy to clipboard operation
nanoGPT copied to clipboard

16 GPU per node

Open spcrobocar opened this issue 5 months ago • 3 comments

Hi, my system has 16 GPUs per node. However, if I run torchrun --standalone --nproc_per_node=16 train.py config/train_gpt2.py The training crashed. How can I use 16 GPUs?

spcrobocar avatar Jan 16 '24 04:01 spcrobocar