taming-transformers icon indicating copy to clipboard operation
taming-transformers copied to clipboard

gpu training hangs

Open shyakocat opened this issue 1 year ago • 2 comments

Related Issue

training with --gpus 0, --gpus 0,1 hangs at initializing ddp ...

I've tried, change ddp to dp, change nccl to gloo, upgrade lightning or torch version... still not work.

(only cpu works)

shyakocat avatar Jan 22 '24 19:01 shyakocat

@shyakocat I encountered the same issue, have you resolved it?

dyflional10 avatar Jan 26 '24 10:01 dyflional10

try running export NCCL_P2P_DISABLE=1 before running the script it worked for me

froestiago avatar Mar 18 '24 10:03 froestiago