pytorch-detect-rfcn
pytorch-detect-rfcn copied to clipboard
RuntimeError: inconsistent tensor sizes
Hi, sorry for bothering. I met a problem when run the code trainval_net.py. with following cfgs Namespace(batch_size=2, checkepoch=1, checkpoint=0, checkpoint_interval=10000, checksession=1, class_agnostic=False, cuda=True, dataset='imagenet_vid+imagenet_det', disp_interval=100, large_scale=False, lr=0.001, lr_decay_gamma=0.1, lr_decay_step=5, mGPUs=True, max_epochs=20, net='res101', num_workers=0, optimizer='sgd', resume=False, save_dir='output/models', session=1, start_epoch=1, use_tfboard=False)
The environment is pytorch 0.3.0+ cuda 8+ cudnn 7.1.
The program broke down after training several iters
[session 1][epoch 1][iter 100] loss: 1.8615, lr: 1.00e-03
fg/bg=(4/252), time cost: 46.256225
rpn_cls: 0.2007, rpn_box: 0.0260, rcnn_cls: 0.1744, rcnn_box 0.0001
[session 1][epoch 1][iter 200] loss: 1.2458, lr: 1.00e-03
fg/bg=(46/210), time cost: 44.331184
rpn_cls: 0.0870, rpn_box: 0.0658, rcnn_cls: 1.1659, rcnn_box 0.5316
[session 1][epoch 1][iter 300] loss: 1.0413, lr: 1.00e-03
fg/bg=(21/235), time cost: 45.259623
rpn_cls: 0.0649, rpn_box: 0.0936, rcnn_cls: 0.6243, rcnn_box 0.2626
[session 1][epoch 1][iter 400] loss: 1.1367, lr: 1.00e-03
fg/bg=(8/248), time cost: 45.338590
rpn_cls: 0.1623, rpn_box: 0.2133, rcnn_cls: 0.3564, rcnn_box 0.0785
[session 1][epoch 1][iter 500] loss: 1.2001, lr: 1.00e-03
fg/bg=(43/213), time cost: 45.177281
rpn_cls: 0.3296, rpn_box: 0.0614, rcnn_cls: 1.0575, rcnn_box 0.4794
[session 1][epoch 1][iter 600] loss: 1.1716, lr: 1.00e-03
fg/bg=(35/221), time cost: 45.165913
rpn_cls: 0.1592, rpn_box: 0.0326, rcnn_cls: 0.8674, rcnn_box 0.3584
[session 1][epoch 1][iter 700] loss: 1.1964, lr: 1.00e-03
fg/bg=(38/218), time cost: 45.357144
rpn_cls: 0.0814, rpn_box: 0.0552, rcnn_cls: 0.8258, rcnn_box 0.3607
[session 1][epoch 1][iter 800] loss: 1.2329, lr: 1.00e-03
fg/bg=(18/238), time cost: 45.654004
rpn_cls: 0.1239, rpn_box: 0.0149, rcnn_cls: 0.4786, rcnn_box 0.1871
[session 1][epoch 1][iter 900] loss: 1.2289, lr: 1.00e-03
fg/bg=(40/216), time cost: 45.081429
rpn_cls: 0.0440, rpn_box: 0.0146, rcnn_cls: 0.8026, rcnn_box 0.4304
[session 1][epoch 1][iter 1000] loss: 1.0724, lr: 1.00e-03
fg/bg=(18/238), time cost: 45.640391
rpn_cls: 0.0690, rpn_box: 0.0864, rcnn_cls: 0.4955, rcnn_box 0.1855
[session 1][epoch 1][iter 1100] loss: 1.1550, lr: 1.00e-03
fg/bg=(37/219), time cost: 44.558274
rpn_cls: 0.0837, rpn_box: 0.0504, rcnn_cls: 0.8120, rcnn_box 0.3720
[session 1][epoch 1][iter 1200] loss: 1.3002, lr: 1.00e-03
fg/bg=(13/243), time cost: 44.615023
rpn_cls: 0.2130, rpn_box: 0.0238, rcnn_cls: 0.3702, rcnn_box 0.1142
[session 1][epoch 1][iter 1300] loss: 1.1840, lr: 1.00e-03
fg/bg=(47/209), time cost: 44.864319
rpn_cls: 0.1111, rpn_box: 0.0345, rcnn_cls: 0.7959, rcnn_box 0.4722
[session 1][epoch 1][iter 1400] loss: 1.1256, lr: 1.00e-03
fg/bg=(25/231), time cost: 45.451395
rpn_cls: 0.1346, rpn_box: 0.0734, rcnn_cls: 0.6207, rcnn_box 0.2413
[session 1][epoch 1][iter 1500] loss: 1.0402, lr: 1.00e-03
fg/bg=(11/245), time cost: 44.467964
rpn_cls: 0.1048, rpn_box: 0.0308, rcnn_cls: 0.2073, rcnn_box 0.1033
[session 1][epoch 1][iter 1600] loss: 1.1574, lr: 1.00e-03
fg/bg=(31/225), time cost: 45.622758
rpn_cls: 0.1083, rpn_box: 0.0375, rcnn_cls: 0.6225, rcnn_box 0.2407
[session 1][epoch 1][iter 1700] loss: 1.1521, lr: 1.00e-03
fg/bg=(14/242), time cost: 44.946758
rpn_cls: 0.1012, rpn_box: 0.0402, rcnn_cls: 0.2832, rcnn_box 0.1185
[session 1][epoch 1][iter 1800] loss: 1.1786, lr: 1.00e-03
fg/bg=(34/222), time cost: 45.144423
rpn_cls: 0.0801, rpn_box: 0.0906, rcnn_cls: 0.5499, rcnn_box 0.3785
Traceback (most recent call last):
File "trainval_net.py", line 329, in
really thanks for your help