YOLOX icon indicating copy to clipboard operation
YOLOX copied to clipboard

[W ProcessGroupNCCL.cpp:1569] Error appers when using multi gpu in one machine.

Open dahaiyidi opened this issue 3 years ago • 1 comments

I try to use 2 gpus, but it failed. Can anyone help me ?

2021-09-02 21:49:27.319 | INFO | yolox.core.launch:_distributed_worker:116 - Rank 1 initialization finished. 2021-09-02 21:49:27.322 | INFO | yolox.core.launch:_distributed_worker:116 - Rank 0 initialization finished. [W ProcessGroupNCCL.cpp:1569] Rank 0 using best-guess GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. [W ProcessGroupNCCL.cpp:1569] Rank 1 using best-guess GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect.Specify device_ids in barrier() to force use of a particular device. Terminated

dahaiyidi avatar Sep 03 '21 15:09 dahaiyidi

我也是在多GPU的情况况下遇到了NCCL的问题,网上的解决办法是要用DDP,而关闭NCCL。请问您的问题解决了吗

1605707467qq avatar Jun 22 '22 07:06 1605707467qq