pytorch-detect-rfcn icon indicating copy to clipboard operation
pytorch-detect-rfcn copied to clipboard

RuntimeError: inconsistent tensor sizes

Open KevinQian97 opened this issue 5 years ago • 0 comments

Hi, sorry for bothering. I met a problem when run the code trainval_net.py. with following cfgs Namespace(batch_size=2, checkepoch=1, checkpoint=0, checkpoint_interval=10000, checksession=1, class_agnostic=False, cuda=True, dataset='imagenet_vid+imagenet_det', disp_interval=100, large_scale=False, lr=0.001, lr_decay_gamma=0.1, lr_decay_step=5, mGPUs=True, max_epochs=20, net='res101', num_workers=0, optimizer='sgd', resume=False, save_dir='output/models', session=1, start_epoch=1, use_tfboard=False)

The environment is pytorch 0.3.0+ cuda 8+ cudnn 7.1.

The program broke down after training several iters [session 1][epoch 1][iter 100] loss: 1.8615, lr: 1.00e-03 fg/bg=(4/252), time cost: 46.256225 rpn_cls: 0.2007, rpn_box: 0.0260, rcnn_cls: 0.1744, rcnn_box 0.0001 [session 1][epoch 1][iter 200] loss: 1.2458, lr: 1.00e-03 fg/bg=(46/210), time cost: 44.331184 rpn_cls: 0.0870, rpn_box: 0.0658, rcnn_cls: 1.1659, rcnn_box 0.5316 [session 1][epoch 1][iter 300] loss: 1.0413, lr: 1.00e-03 fg/bg=(21/235), time cost: 45.259623 rpn_cls: 0.0649, rpn_box: 0.0936, rcnn_cls: 0.6243, rcnn_box 0.2626 [session 1][epoch 1][iter 400] loss: 1.1367, lr: 1.00e-03 fg/bg=(8/248), time cost: 45.338590 rpn_cls: 0.1623, rpn_box: 0.2133, rcnn_cls: 0.3564, rcnn_box 0.0785 [session 1][epoch 1][iter 500] loss: 1.2001, lr: 1.00e-03 fg/bg=(43/213), time cost: 45.177281 rpn_cls: 0.3296, rpn_box: 0.0614, rcnn_cls: 1.0575, rcnn_box 0.4794 [session 1][epoch 1][iter 600] loss: 1.1716, lr: 1.00e-03 fg/bg=(35/221), time cost: 45.165913 rpn_cls: 0.1592, rpn_box: 0.0326, rcnn_cls: 0.8674, rcnn_box 0.3584 [session 1][epoch 1][iter 700] loss: 1.1964, lr: 1.00e-03 fg/bg=(38/218), time cost: 45.357144 rpn_cls: 0.0814, rpn_box: 0.0552, rcnn_cls: 0.8258, rcnn_box 0.3607 [session 1][epoch 1][iter 800] loss: 1.2329, lr: 1.00e-03 fg/bg=(18/238), time cost: 45.654004 rpn_cls: 0.1239, rpn_box: 0.0149, rcnn_cls: 0.4786, rcnn_box 0.1871 [session 1][epoch 1][iter 900] loss: 1.2289, lr: 1.00e-03 fg/bg=(40/216), time cost: 45.081429 rpn_cls: 0.0440, rpn_box: 0.0146, rcnn_cls: 0.8026, rcnn_box 0.4304 [session 1][epoch 1][iter 1000] loss: 1.0724, lr: 1.00e-03 fg/bg=(18/238), time cost: 45.640391 rpn_cls: 0.0690, rpn_box: 0.0864, rcnn_cls: 0.4955, rcnn_box 0.1855 [session 1][epoch 1][iter 1100] loss: 1.1550, lr: 1.00e-03 fg/bg=(37/219), time cost: 44.558274 rpn_cls: 0.0837, rpn_box: 0.0504, rcnn_cls: 0.8120, rcnn_box 0.3720 [session 1][epoch 1][iter 1200] loss: 1.3002, lr: 1.00e-03 fg/bg=(13/243), time cost: 44.615023 rpn_cls: 0.2130, rpn_box: 0.0238, rcnn_cls: 0.3702, rcnn_box 0.1142 [session 1][epoch 1][iter 1300] loss: 1.1840, lr: 1.00e-03 fg/bg=(47/209), time cost: 44.864319 rpn_cls: 0.1111, rpn_box: 0.0345, rcnn_cls: 0.7959, rcnn_box 0.4722 [session 1][epoch 1][iter 1400] loss: 1.1256, lr: 1.00e-03 fg/bg=(25/231), time cost: 45.451395 rpn_cls: 0.1346, rpn_box: 0.0734, rcnn_cls: 0.6207, rcnn_box 0.2413 [session 1][epoch 1][iter 1500] loss: 1.0402, lr: 1.00e-03 fg/bg=(11/245), time cost: 44.467964 rpn_cls: 0.1048, rpn_box: 0.0308, rcnn_cls: 0.2073, rcnn_box 0.1033 [session 1][epoch 1][iter 1600] loss: 1.1574, lr: 1.00e-03 fg/bg=(31/225), time cost: 45.622758 rpn_cls: 0.1083, rpn_box: 0.0375, rcnn_cls: 0.6225, rcnn_box 0.2407 [session 1][epoch 1][iter 1700] loss: 1.1521, lr: 1.00e-03 fg/bg=(14/242), time cost: 44.946758 rpn_cls: 0.1012, rpn_box: 0.0402, rcnn_cls: 0.2832, rcnn_box 0.1185 [session 1][epoch 1][iter 1800] loss: 1.1786, lr: 1.00e-03 fg/bg=(34/222), time cost: 45.144423 rpn_cls: 0.0801, rpn_box: 0.0906, rcnn_cls: 0.5499, rcnn_box 0.3785 Traceback (most recent call last): File "trainval_net.py", line 329, in data = next(det_data_iter) File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 188, in next batch = self.collate_fn([self.dataset[i] for i in indices]) File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 119, in default_collate return [default_collate(samples) for samples in transposed] File "/root/anaconda2/lib/python2.7/site-packages/torch/utils/data/dataloader.py", line 96, in default_collate return torch.stack(batch, 0, out=out) File "/root/anaconda2/lib/python2.7/site-packages/torch/functional.py", line 64, in stack return torch.cat(inputs, dim) RuntimeError: inconsistent tensor sizes at /opt/conda/conda-bld/pytorch_1513363039688/work/torch/lib/TH/generic/THTensorMath.c:2864

really thanks for your help

KevinQian97 avatar Jun 16 '19 01:06 KevinQian97