YOLOv6
YOLOv6 copied to clipboard
多机多卡问题
你好,我在多机多卡上训练时,出现了以下问题,想请教一下是不是有什么地方需要改动: RuntimeError: Address already in use
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
add --master_port 30001
or other value in start command, for example:
python -m torch.distributed.launch --nproc_per_node 8 --master_port 30002 tools/train.py ...
add
--master_port 30001
or other value in start command, for example:python -m torch.distributed.launch --nproc_per_node 8 --master_port 30002 tools/train.py ...
I have tried this, but met the same problem.
I can train it with single gpu.
add
--master_port 30001
or other value in start command, for example:python -m torch.distributed.launch --nproc_per_node 8 --master_port 30002 tools/train.py ...
我的环境是 troch 1.8.1+cuda90.cudnn7.6.5 python 3.6 这会有影响吗
add
--master_port 30001
or other value in start command, for example:python -m torch.distributed.launch --nproc_per_node 8 --master_port 30002 tools/train.py ...
我的环境是 troch 1.8.1+cuda90.cudnn7.6.5 python 3.6 这会有影响吗
一般没影响,可以看下nvidia-smi以及完整的错误截图吗?