VBLC icon indicating copy to clipboard operation
VBLC copied to clipboard

Ask for Multi-GPU Training

Open gyuwonchoi opened this issue 2 years ago • 1 comments

Hi, Thank you for sharing the code of your work.

While reviewing the './tools/train.py' script, I noticed that the multi-GPU mode is not supported.

I was wondering if there is an alternative way for me to train the code using MMDistributedDataParallel. I have NVIDIA TITAN V (12GB) GPUs, which cannot train the model based on Transformer in single GPU.

 if args.gpus is not None:
        cfg.gpu_ids = range(4)    
        warnings.warn('`--gpus` is deprecated because we only support '
                      'single GPU mode in non-distributed training. '
                      'Use `gpus=1` now.')
    if args.gpu_ids is not None:
        cfg.gpu_ids = args.gpu_ids[0:3]
        warnings.warn('`--gpu-ids` is deprecated, please use `--gpu-id`. '
                      'Because we only support single GPU mode in '
                      'non-distributed training. Use the first GPU '
                      'in `gpu_ids` now.')

Thank you for response in advance.

gyuwonchoi avatar Apr 24 '23 16:04 gyuwonchoi

Hi gyuwonchoi, Thanks for your interest in our work!

We haven't tried training on multiple GPUs, but I assume a simple answer is yes.

We base our method on MMSegmentation, and here is a documentation from it on how to train on multiple GPUs.

Since we use run_experiments.py in place of tools/train.py, there should be some difference in using, e.g., the entry point in tools/dist_train.sh should change accordingly.

Also, modification might be made to samples_per_gpu (e.g., from 2 to 1 for two GPUs) to keep the training batch size(2 source + 2 target = 4 in total). Explanations could be found here.

Apologies for not having time to delve into this now. Any feedbacks are welcome if you are willing to try it out!

Best.

KiwiXR avatar Apr 25 '23 06:04 KiwiXR