quantized.pytorch icon indicating copy to clipboard operation
quantized.pytorch copied to clipboard

crash during training

Open binbinmeng opened this issue 6 years ago • 2 comments

TRAINING - Epoch: [0][410/446] Time 0.602 (0.622) Data 0.000 (0.005) Loss 4.0999 (5.5282) Prec@1 2.344 (3.435) Prec@5 19.531 (14.536) TRAINING - Epoch: [0][420/446] Time 0.602 (0.622) Data 0.000 (0.005) Loss 4.1251 (5.4952) Prec@1 3.906 (3.459) Prec@5 20.312 (14.664) TRAINING - Epoch: [0][430/446] Time 0.611 (0.621) Data 0.000 (0.005) Loss 4.0770 (5.4635) Prec@1 3.125 (3.478) Prec@5 24.219 (14.813) TRAINING - Epoch: [0][440/446] Time 0.600 (0.621) Data 0.000 (0.005) Loss 4.0965 (5.4331) Prec@1 7.031 (3.515) Prec@5 19.531 (14.948) Traceback (most recent call last): File "main.py", line 305, in main() File "main.py", line 187, in main train_loader, model, criterion, epoch, optimizer) File "main.py", line 293, in train training=True, optimizer=optimizer) File "main.py", line 249, in forward output = model(inputs) File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, **kwargs) File "/workspace/pytorch-quantization/quantized.pytorch/models/resnet_quantized.py", line 148, in forward x = self.layer3(x) File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, **kwargs) File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward input = module(input) File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, **kwargs) File "/workspace/pytorch-quantization/quantized.pytorch/models/resnet_quantized.py", line 56, in forward out = self.bn1(out) File "/opt/conda/envs/pytorch-py3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, **kwargs) File "/workspace/pytorch-quantization/quantized.pytorch/models/modules/quantize.py", line 272, in forward y = y.view(C, self.num_chunks, B * H * W // self.num_chunks) RuntimeError: invalid argument 2: size '[256 x 16 x 134]' is invalid for input with 551936 elements at ../src/TH/THStorage.cpp:40

binbinmeng avatar Oct 15 '18 05:10 binbinmeng

I encountered the same issue. I think this is because the data is not a multiple of (C * self.num_chunks). It does not happen until the last step of training where the batch size is a bit different

amjltc295 avatar May 19 '19 00:05 amjltc295

Seems to be a bug

amjltc295 avatar May 19 '19 00:05 amjltc295