GSM icon indicating copy to clipboard operation
GSM copied to clipboard

Training issue on num_segment=12

Open Fazlik995 opened this issue 4 years ago • 3 comments

Hi, I successfully run your network on Somethin-v1 with num_segment=8

However, when I use num_segment=12, after 1st epoch, I am receiving the following error: RuntimeError: shape '[-1, 8, 4, 27, 27]' is invalid for input of size 34992

Any ideas?

Fazlik995 avatar Aug 24 '20 01:08 Fazlik995

Hi,

Please comment the script you used for training and the line where the error occurs. Also, is it occurring during validation or during training?

swathikirans avatar Aug 24 '20 07:08 swathikirans

Script: python3 main.py something-v1 RGB --arch InceptionV3 --num_segments 12 --consensus_type avg --batch-size 16 --iter_size 2 --dropout 0.5 --lr 0.01 --warmup 10 --epochs 60 --eval-freq 5 --gd 20 --run_iter 1 -j 16 --npb --gsm

error: Epoch: [0][5360/5377], lr: 0.00100 Time 2.036 (0.944) Data 1.384 (0.325) Loss 2.3372 (3.6284) Prec@1 6.250 (3.454) Prec@5 6.250 (12.217) Traceback (most recent call last): File "main.py", line 385, in main() File "main.py", line 165, in main train_prec1 = train(train_loader, model, criterion, optimizer, epoch, log_training, writer=writer) File "main.py", line 220, in train output = model(input_var) File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply raise output File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker output = module(*input, **kwargs) File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/media/CVIP/GSM/models.py", line 194, in forward base_out = self.base_model(input.view((-1, sample_len) + input.size()[-2:])) File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/media/CVIP/GSM/model_zoo/bninception/pytorch_load.py", line 121, in forward data_dict[op[2]] = getattr(self, op[0])(data_dict[op[-1]]) File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/media/CVIP/GSM/gsm.py", line 243, in forward x = self.cam1(x) File "/home/fazlik/python36/local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, **kwargs) File "/media/CVIP/GSM/gsm.py", line 70, in forward reshape_bottleneck = bottleneck.view((-1, self.n_segment) + bottleneck.size()[1:]) # n, t, c, h, w RuntimeError: shape '[-1, 8, 4, 27, 27]' is invalid for input of size 34992

I got an error after 1st epoch

Any ideas?

Fazlik995 avatar Aug 25 '20 14:08 Fazlik995

Hi,

It seems like the error is triggered in somewhere that is not part of the original implementation shared in this repo. I am unable to help in this case. I would suggest you to try adding "drop_last=True" in the data_loader definition.

swathikirans avatar Aug 25 '20 15:08 swathikirans