LIGA-Stereo icon indicating copy to clipboard operation
LIGA-Stereo copied to clipboard

scripts/dist_train.sh

Open JangChangWon opened this issue 2 years ago • 2 comments

Hello, Thanks for your excellent work !

I have several problem about distributed training

When i try to "CUDA_VISIBLE_DEVICE=0 python3 tools/train.py --cfg_file ${cfg} --batch_size 1" and "CUDA_VISIBLE_DEVICE=0 ./scripts/dist_train.sh 1 exp cfg_path", it is worked. but when i try to "python3 tools/train.py --cfg_file ${cfg} --batch_size 1" or "CUDA_VISIBLE_DEVICE=0,1,2,3 python3 tools/train.py --cfg_file ${cfg} --batch_size 1" or "CUDA_VISIBLE_DEVICE=0,1,2,3 ./scripts/dist_train.sh 4 exp cfg_path", That are not worked. How can i modify about the code for distributed training?

JangChangWon avatar Apr 13 '22 04:04 JangChangWon

I guess that you should set NGPUS=5 instead of 4. (CUDA_VISIBLE_DEVICE=0,1,2,3,4 ==> 5 GPUs)

zjwzcx avatar Apr 16 '22 14:04 zjwzcx

I guess that you should set NGPUS=5 instead of 4. (CUDA_VISIBLE_DEVICE=0,1,2,3,4 ==> 5 GPUs)

I wrote it down wrong. Thank you for letting me know.

JangChangWon avatar Apr 17 '22 05:04 JangChangWon