nsff_pl
nsff_pl copied to clipboard
Multi_gpu train issue
I changed the default parameters num_gpus-->8\num_nodes-->8, and keep other default parameters unchanged, then run the train.py with 8_v100, but it always stuck on the device initialization and can't go on the data read process, are there any additional settings require? 🤔

I tested other multi_number, it's all stuck, and can successly run when gpus=1