mmdetection
mmdetection copied to clipboard
torch.distributed.elastic.multiprocessing.errors.ChildFailedError
When training mmdet3.x using a single machine with multiple gpus, This distribution error is reported every time after the third epoch of training. How to solve this problem?
i have same issue
i have same issue
I also have the same question, have you solve it?
i have same issue
Me too
the same error!!!
same
same
how to solve it?
I also have the same question, have you solve it?
try https://github.com/pytorch/pytorch/issues/121222
related issues: https://github.com/open-mmlab/mmdetection/issues/6934#issuecomment-1066255179