mmdetection icon indicating copy to clipboard operation
mmdetection copied to clipboard

how to set multi gpu environment like dist_train.sh?

Open hoya-cho opened this issue 1 year ago • 1 comments

How can I set multi gpu environment in my program like multi gpu using torch.distributed.launch in dist_train.sh?

Even if the master port and master address are set in os.environ, local_rank is not repeated because the gpu is not caught.

Torch.cuda.device_count() catches 2 gpu, but os.environ environment catches 1.


Runtime environment: cudnn_benchmark: True mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: None Distributed launcher: pytorch Distributed training: True GPU number: 1

hoya-cho avatar Jun 19 '23 10:06 hoya-cho

hi @hoya-cho , I also have same problem with this. did you solve this problem?

narchitect avatar May 07 '24 16:05 narchitect