Multi_gpu train issue

Open aha831 opened this issue 3 years ago • 1 comments

I changed the default parameters num_gpus-->8\num_nodes-->8, and keep other default parameters unchanged, then run the train.py with 8_v100, but it always stuck on the device initialization and can't go on the data read process, are there any additional settings require? 🤔

Sep 24 '22 03:09 aha831

I tested other multi_number, it's all stuck, and can successly run when gpus=1

Sep 24 '22 03:09 aha831