text2image icon indicating copy to clipboard operation
text2image copied to clipboard

How to use multiple GPUs for training?

Open iwldzt3011 opened this issue 2 years ago • 4 comments

iwldzt3011 avatar Apr 05 '22 11:04 iwldzt3011

line523-524 in main.py has defined the multi-gpu conditions. Directly run the file whever you have multiple GPUs, like

CUDA_VISIBLE_DEVICES=10,,2,3,4,5,6,7 python main.py

wtliao avatar Apr 05 '22 13:04 wtliao

line523-524 in main.py has defined the multi-gpu conditions. Directly run the file whever you have multiple GPUs, like

CUDA_VISIBLE_DEVICES=10,,2,3,4,5,6,7 python main.py

OK, I'll try it. Thank you very much

iwldzt3011 avatar Apr 05 '22 13:04 iwldzt3011

line523-524 in main.py has defined the multi-gpu conditions. Directly run the file whever you have multiple GPUs, like

CUDA_VISIBLE_DEVICES=10,,2,3,4,5,6,7 python main.py

When I use multiple 2080tis for training, I will report an error, nccl error 2 unhandled system. Have you ever encountered this problem? What version of torch do you use? The version I use is 1.10

iwldzt3011 avatar Apr 05 '22 15:04 iwldzt3011

line523-524 in main.py has defined the multi-gpu conditions. Directly run the file whever you have multiple GPUs, like CUDA_VISIBLE_DEVICES=10,,2,3,4,5,6,7 python main.py

When I use multiple 2080tis for training, I will report an error, nccl error 2 unhandled system. Have you ever encountered this problem? What version of torch do you use? The version I use is 1.10

I solved this problem. After I reduced the torch version to 1.6.0, I found that the speed of training on four 2080tis is almost the same as that on one 3080. Is it normal to have an epoch in about 13 minutes?

iwldzt3011 avatar Apr 05 '22 16:04 iwldzt3011