YOLOv6 多机多卡问题

你好，我在多机多卡上训练时，出现了以下问题，想请教一下是不是有什么地方需要改动： RuntimeError: Address already in use

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

Jul 14 '22 11:07 FL77N

add --master_port 30001 or other value in start command, for example:

python -m torch.distributed.launch --nproc_per_node 8 --master_port 30002  tools/train.py ...

Jul 14 '22 12:07 mtjhl

add --master_port 30001 or other value in start command, for example:
python -m torch.distributed.launch --nproc_per_node 8 --master_port 30002  tools/train.py ...

I have tried this, but met the same problem.

Jul 14 '22 13:07 FL77N

I can train it with single gpu.

Jul 14 '22 13:07 FL77N

add --master_port 30001 or other value in start command, for example:
python -m torch.distributed.launch --nproc_per_node 8 --master_port 30002  tools/train.py ...

我的环境是 troch 1.8.1+cuda90.cudnn7.6.5 python 3.6 这会有影响吗

Jul 15 '22 01:07 FL77N

add --master_port 30001 or other value in start command, for example:
python -m torch.distributed.launch --nproc_per_node 8 --master_port 30002  tools/train.py ...
我的环境是 troch 1.8.1+cuda90.cudnn7.6.5 python 3.6 这会有影响吗

一般没影响，可以看下nvidia-smi以及完整的错误截图吗？

Jul 29 '22 04:07 shensheng272

YOLOv6 YOLOv6 copied to clipboard

多机多卡问题

YOLOv6
YOLOv6 copied to clipboard