SegFormer
SegFormer copied to clipboard
训练SegFormer-B1在cityscapes数据集,出现错误“Defaule process group is not initialized”
Hi, when I train SegFormer-B1 on cityscapes:
预训练模型路径:
./SegFormer/pretrained/mit_b1.pth
执行脚本:
cd SegFormer
python tools/train.py local_configs/segformer/B1/segformer.b1.1024x1024.city.160k.py
报错信息:
Traceback (most recent call last):
File "tools/train.py", line 166, in
请问大神如何解决该问题?谢谢了
please use 'dist_train.sh' instead of 'train.py', because I use 'syncbn' in the model. It requires pytorch ddp.
tks, but when I train SegFormer-B1 on cityscapes as you said
执行脚本:
./tools/dist_train.sh local_configs/segformer/B1/segformer.b1.1024x1024.city.160k.py
报错信息:
Traceback (most recent call last):
File "./tools/train.py", line 180, in
请问大神如何解决该问题?谢谢了~求回复~
Hi,l have the same problem,l want to train the model with a single-gpu because l have only one gpu,but it reported the same error after l modify 'SyncBN' in norm_cfg to 'BN'.Can you help me to solve it? thank you very much .
Traceback (most recent call last):
File "tools/train.py", line 167, in
you can also try replace all SyncBN with BN in the model. Modify the config and segformer_head.
Hi,l have the same problem,l want to train the model with a single-gpu because l have only one gpu,but it reported the same error after l modify 'SyncBN' in norm_cfg to 'BN'.Can you help me to solve it? thank you very much .
Traceback (most recent call last): File "tools/train.py", line 167, in main() File "tools/train.py", line 163, in main meta=meta) File "/home/guzhengjie/Demo/SegFormer/mmseg/apis/train.py", line 115, in train_segmentor runner.run(data_loaders, cfg.workflow) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 131, in run iter_runner(iter_loaders[i], **kwargs) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train outputs = self.model.train_step(data_batch, self.optimizer, **kwargs) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step return self.module.train_step(*inputs[0], **kwargs[0]) File "/home/guzhengjie/Demo/SegFormer/mmseg/models/segmentors/base.py", line 152, in train_step losses = self(**data_batch) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func return old_func(*args, **kwargs) File "/home/guzhengjie/Demo/SegFormer/mmseg/models/segmentors/base.py", line 122, in forward return self.forward_train(img, img_metas, **kwargs) File "/home/guzhengjie/Demo/SegFormer/mmseg/models/segmentors/encoder_decoder.py", line 153, in forward_train x = self.extract_feat(img) File "/home/guzhengjie/Demo/SegFormer/mmseg/models/segmentors/encoder_decoder.py", line 79, in extract_feat x = self.backbone(img) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/guzhengjie/Demo/SegFormer/mmseg/models/backbones/resnet.py", line 635, in forward x = self.stem(x) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 493, in forward world_size = torch.distributed.get_world_size(process_group) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 620, in get_world_size return _get_group_size(group) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 219, in _get_group_size _check_default_pg() File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 210, in _check_default_pg "Default process group is not initialized" AssertionError: Default process group is not initialized
Hello, did you succeed in it? :)
I also meet this issue. After i replace the 'SyncBN' in "segformer.b1.512x512.ade.160k.py", "segformer.py" and "seformer_head.py" with 'BN', the error disappear.
Hi,l have the same problem,l want to train the model with a single-gpu because l have only one gpu,but it reported the same error after l modify 'SyncBN' in norm_cfg to 'BN'.Can you help me to solve it? thank you very much . Traceback (most recent call last): File "tools/train.py", line 167, in main() File "tools/train.py", line 163, in main meta=meta) File "/home/guzhengjie/Demo/SegFormer/mmseg/apis/train.py", line 115, in train_segmentor runner.run(data_loaders, cfg.workflow) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 131, in run iter_runner(iter_loaders[i], **kwargs) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train outputs = self.model.train_step(data_batch, self.optimizer, **kwargs) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step return self.module.train_step(*inputs[0], **kwargs[0]) File "/home/guzhengjie/Demo/SegFormer/mmseg/models/segmentors/base.py", line 152, in train_step losses = self(**data_batch) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func return old_func(*args, **kwargs) File "/home/guzhengjie/Demo/SegFormer/mmseg/models/segmentors/base.py", line 122, in forward return self.forward_train(img, img_metas, **kwargs) File "/home/guzhengjie/Demo/SegFormer/mmseg/models/segmentors/encoder_decoder.py", line 153, in forward_train x = self.extract_feat(img) File "/home/guzhengjie/Demo/SegFormer/mmseg/models/segmentors/encoder_decoder.py", line 79, in extract_feat x = self.backbone(img) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/guzhengjie/Demo/SegFormer/mmseg/models/backbones/resnet.py", line 635, in forward x = self.stem(x) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 493, in forward world_size = torch.distributed.get_world_size(process_group) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 620, in get_world_size return _get_group_size(group) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 219, in _get_group_size _check_default_pg() File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 210, in _check_default_pg "Default process group is not initialized" AssertionError: Default process group is not initialized
Hello, did you succeed in it? :)
seg head 也要改成bn
Hi,l have the same problem,l want to train the model with a single-gpu because l have only one gpu,but it reported the same error after l modify 'SyncBN' in norm_cfg to 'BN'.Can you help me to solve it? thank you very much .
Traceback (most recent call last): File "tools/train.py", line 167, in main() File "tools/train.py", line 163, in main meta=meta) File "/home/guzhengjie/Demo/SegFormer/mmseg/apis/train.py", line 115, in train_segmentor runner.run(data_loaders, cfg.workflow) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 131, in run iter_runner(iter_loaders[i], **kwargs) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train outputs = self.model.train_step(data_batch, self.optimizer, **kwargs) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step return self.module.train_step(*inputs[0], **kwargs[0]) File "/home/guzhengjie/Demo/SegFormer/mmseg/models/segmentors/base.py", line 152, in train_step losses = self(**data_batch) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func return old_func(*args, **kwargs) File "/home/guzhengjie/Demo/SegFormer/mmseg/models/segmentors/base.py", line 122, in forward return self.forward_train(img, img_metas, **kwargs) File "/home/guzhengjie/Demo/SegFormer/mmseg/models/segmentors/encoder_decoder.py", line 153, in forward_train x = self.extract_feat(img) File "/home/guzhengjie/Demo/SegFormer/mmseg/models/segmentors/encoder_decoder.py", line 79, in extract_feat x = self.backbone(img) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/guzhengjie/Demo/SegFormer/mmseg/models/backbones/resnet.py", line 635, in forward x = self.stem(x) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 493, in forward world_size = torch.distributed.get_world_size(process_group) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 620, in get_world_size return _get_group_size(group) File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 219, in _get_group_size _check_default_pg() File "/home/guzhengjie/anaconda3/envs/segformer/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 210, in _check_default_pg "Default process group is not initialized" AssertionError: Default process group is not initialized
seg head 也要改成bn