GFPGAN icon indicating copy to clipboard operation
GFPGAN copied to clipboard

unable to start training : NCCL library error

Open nowfalcodmeric opened this issue 4 years ago • 2 comments

RuntimeError: RuntimeErrorRuntimeErrorNCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3 ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).: RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3 ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3 ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).

nowfalcodmeric avatar Feb 03 '22 05:02 nowfalcodmeric

maybe you GPU numbers is not ture for parameter setting

Asuka001100 avatar Feb 08 '22 12:02 Asuka001100

@Asuka001100 couldn't understand what u meant

ucalyptus2 avatar Dec 07 '22 14:12 ucalyptus2