nanoGPT
nanoGPT copied to clipboard
16 GPU per node
Hi, my system has 16 GPUs per node. However, if I run
torchrun --standalone --nproc_per_node=16 train.py config/train_gpt2.py
The training crashed.
How can I use 16 GPUs?
16 gpus per node? wont you have 2 nodes of 8xGPU? Also what GPUs
@spcrobocar you have old mining rig with 16 gpu connected with 1x pcie express, right?
If your case matched what @a0s expected, then you should pair the node for some set of GPUs. If that doesn't work, hardware capacity could be lacking.
are you able to use 2 GPUs?