HRNet-Semantic-Segmentation icon indicating copy to clipboard operation
HRNet-Semantic-Segmentation copied to clipboard

AttributeError: SyncBatchNorm is only supported within torch.nn.parallel.DistributedDataParallel?

Open EricHuiK opened this issue 4 years ago • 4 comments

EricHuiK avatar Sep 03 '20 08:09 EricHuiK

i also have this problem, do you solve it now?

LemonLov avatar Jan 27 '21 14:01 LemonLov

SyncBatchNorm only used for multi-GPU parallel training, if you only have one GPU to train, you can change "torch.nn.SyncBatchNorm" to "torch.nn.BatchNorm2d", so you can solve this problem.

JensenGao avatar Apr 27 '21 06:04 JensenGao

me too here are the details:

=> init weights from normal distribution Traceback (most recent call last): File "tools/test.py", line 139, in main() File "tools/test.py", line 71, in main logger.info(get_model_summary(model.cuda(), dump_input.cuda())) File "/users/guozibin/dukaiyang/5.11.depth-net-train/HRNet-Semantic-Segmentation-HRNet-OCR/tools/../lib/utils/modelsummary.py", line 90, in get_model_summary model(*input_tensors) File "/users/guozibin/.conda/envs/henet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/users/guozibin/dukaiyang/5.11.depth-net-train/HRNet-Semantic-Segmentation-HRNet-OCR/tools/../lib/models/seg_hrnet.py", line 418, in forward x = self.bn1(x) File "/users/guozibin/.conda/envs/henet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/users/guozibin/.conda/envs/henet/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 438, in forward raise AttributeError('SyncBatchNorm is only supported within torch.nn.parallel.DistributedDataParallel') AttributeError: SyncBatchNorm is only supported within torch.nn.parallel.DistributedDataParallel

alexanderuo avatar Jun 09 '21 13:06 alexanderuo

Also have this problem. And i'm using DDP training.
CUDA_VISIBLE_DEVICES=0,1 python3 -m torch.distributed.launch --nproc_per_node=2 train.py ...

liu09114 avatar Sep 02 '22 08:09 liu09114