nanoGPT
nanoGPT copied to clipboard
Multi GPUs training is very slow
I used 4 GPUs on 1 node:
torchrun --standalone --proc_per_node=4 train.py --compile=False
But, the training speed is just like 1 GPU,why?