YOLOX
YOLOX copied to clipboard
[W ProcessGroupNCCL.cpp:1569] Error appers when using multi gpu in one machine.
I try to use 2 gpus, but it failed. Can anyone help me ?
2021-09-02 21:49:27.319 | INFO | yolox.core.launch:_distributed_worker:116 - Rank 1 initialization finished. 2021-09-02 21:49:27.322 | INFO | yolox.core.launch:_distributed_worker:116 - Rank 0 initialization finished. [W ProcessGroupNCCL.cpp:1569] Rank 0 using best-guess GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1569] Rank 1 using best-guess GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. Terminated
我也是在多GPU的情况况下遇到了NCCL的问题,网上的解决办法是要用DDP,而关闭NCCL。请问您的问题解决了吗