RNNPose
RNNPose copied to clipboard
RuntimeError: NCCL error
When running the eval.py script with "--use_dist True", I am facing this error: RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370128159/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8
I am using this Docker image: "nvcr.io/nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04" since the one mentioned in the original Dockerfile is no longer available on the Docker hub.
Any suggestion about what could the problem be? Thank you in advance
hey @AramNasser, have you resolved this issue? I am getting the same error.