YOLOX
YOLOX copied to clipboard
ncclUnhandledCudaError: Call to CUDA function failed.
Traceback (most recent call last): File "/home/psdz/anaconda3/envs/yolox/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/home/psdz/YOLOX/yolox/core/launch.py", line 91, in _distributed_worker comm.synchronize() File "/home/psdz/YOLOX/yolox/utils/dist.py", line 48, in synchronize dist.barrier() File "/home/psdz/anaconda3/envs/yolox/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 2524, in barrier work = default_pg.barrier(opts=opts) RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:38, unhandled cuda error, NCCL version 2.7.8
https://github.com/Megvii-BaseDetection/YOLOX/issues/147
issue as above:
i met the same bug。and i am working on it. can you help me out?
i met the same issue during train the yolox_nano
i met the same issue during train the yolox_nano