BEVFormer
BEVFormer copied to clipboard
NCCL Error on WSL2
When I am running both train and test of the model on single GPU (./tools/fp16/dist_train.sh ./projects/configs/bevformer_fp16/bevformer_tiny_fp16.py 1), I am getting this error:
RuntimeError: NCCL error in: ../torch/lib/c10d/ProcessGroupNCCL.cpp:911, unhandled system error, NCCL version 2.7.8
ncclSystemError: System call (socket, malloc, munmap, etc) failed.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 439853) of binary
Do you know how to fix it? PS: I am running it on WSL2