RNNPose icon indicating copy to clipboard operation
RNNPose copied to clipboard

RuntimeError: NCCL error

Open AramNasser opened this issue 1 year ago • 1 comments

When running the eval.py script with "--use_dist True", I am facing this error: RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607370128159/work/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled system error, NCCL version 2.7.8

I am using this Docker image: "nvcr.io/nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04" since the one mentioned in the original Dockerfile is no longer available on the Docker hub.

Any suggestion about what could the problem be? Thank you in advance

AramNasser avatar Mar 07 '24 11:03 AramNasser

hey @AramNasser, have you resolved this issue? I am getting the same error.

Nishanth21D avatar Aug 15 '24 12:08 Nishanth21D