pytorch-deeplab-xception icon indicating copy to clipboard operation
pytorch-deeplab-xception copied to clipboard

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])

Open YadongLau opened this issue 5 years ago • 5 comments

When I train my coco-style dataset by use : bash train_coco.sh then the errors as follows:

Namespace(backbone='resnet', base_size=513, batch_size=4, checkname='deeplab-resnet', crop_size=513, cuda=True, dataset='coco', epochs=10, eval_interval=1, freeze_bn=False, ft=False, gpu_ids=[0], loss_type='ce', lr=0.01, lr_scheduler='poly', momentum=0.9, nesterov=False, no_cuda=False, no_val=False, out_stride=16, resume=None, seed=1, start_epoch=0, sync_bn=False, test_batch_size=4, use_balanced_weights=False, use_sbd=True, weight_decay=0.0005, workers=4) loading annotations into memory... Done (t=0.00s) creating index... index created! loading annotations into memory... Done (t=0.00s) creating index... index created! Using poly LR Scheduler! Starting Epoch: 0 Total Epoches: 10 0%| | 0/1 [00:00<?, ?it/s] =>Epoches 0, learning rate = 0.0100, previous best = 0.0000 Traceback (most recent call last): File "train.py", line 306, in main() File "train.py", line 299, in main trainer.training(epoch) File "train.py", line 104, in training output = self.model(image) File "/home/dachen/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/home/dachen/conda/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(*inputs[0], **kwargs[0]) File "/home/dachen/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/home/dachen/Music/pytorch-deeplab-xception/modeling/deeplab.py", line 30, in forward x = self.aspp(x) File "/home/dachen/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/home/dachen/Music/pytorch-deeplab-xception/modeling/aspp.py", line 70, in forward x5 = self.global_avg_pool(x) File "/home/dachen/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/home/dachen/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/dachen/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, **kwargs) File "/home/dachen/conda/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 81, in forward exponential_average_factor, self.eps) File "/home/dachen/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 1666, in batch_norm raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size)) ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1]) what should i do ?

YadongLau avatar Dec 10 '19 07:12 YadongLau

@jfzhang95@lyd953621450 Hi, I have faced the same problem. Do you train your own datasets? I think the key is the label.

HuangLian126 avatar Dec 13 '19 03:12 HuangLian126

@lyd953621450 @HuangLian126 I think the reason is that the BatchNorm in the global_avg_pool requires the batch size to be larger than 1. If you have already set a batch size larger than 1 and still faces this problem, that is probably because the remainder of the training data number divided by the batch size is 1, which leads to the fact that there can always be one single training data in one epoch. In such a case, I suggest you set the drop_last flag in the dataloader as true to drop this last single training data.

hlwang1124 avatar Dec 20 '19 17:12 hlwang1124

set batch size bigger than 1

linzhenyuyuchen avatar Jan 16 '20 08:01 linzhenyuyuchen

@hlwang1124 i have same issue, how can i set the drop_last flag as true?,
also if i have problem in my dataset, can dataset trigger this issue??

kimsu1219 avatar Aug 25 '21 11:08 kimsu1219

python train.py --backbone xception --lr 0.0001 --epochs 10 --batch-size 2 --gpu-ids 0 --checkname deeplab-xception Hi, I still got this error when my batch size is 2. Have you solved this problem? @YadongLau @kimsu1219 @HuangLian126

SpectorSong avatar Mar 25 '22 03:03 SpectorSong