DenseCL icon indicating copy to clipboard operation
DenseCL copied to clipboard

[Err]: RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

Open CaptainEven opened this issue 2 years ago • 2 comments

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group. /usr/local/lib/python3.7/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 166, in forward return self.module(*inputs[0], **kwargs[0]) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/mnt/diske/even/DenseCL/openselfsup/models/densecl.py", line 279, in forward return self.forward_train(img, **kwargs) File "/mnt/diske/even/DenseCL/openselfsup/models/densecl.py", line 200, in forward_train im_k, idx_unshuffle = self._batch_shuffle_ddp(im_k) File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/mnt/diske/even/DenseCL/openselfsup/models/densecl.py", line 132, in _batch_shuffle_ddp x_gather = concat_all_gather(x) File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/mnt/diske/even/DenseCL/openselfsup/models/densecl.py", line 297, in concat_all_gather for _ in range(torch.distributed.get_world_size()) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 748, in get_world_size return _get_group_size(group) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 274, in _get_group_size python-BaseException default_pg = _get_default_group() File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 358, in _get_default_group raise RuntimeError("Default process group has not been initialized, " RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

CaptainEven avatar Sep 14 '22 07:09 CaptainEven

modify the config RESNETS: NORM: "SyncBN"->''BN''

sqdcbbj avatar Mar 02 '23 02:03 sqdcbbj

that doesnt really work. has anyone been able to solve this problem?

gabys-tb avatar Jun 16 '24 17:06 gabys-tb