faster-rcnn-resnet icon indicating copy to clipboard operation
faster-rcnn-resnet copied to clipboard

loss output: is this normal?

Open askerlee opened this issue 6 years ago • 5 comments

Hi, I'm using ResNet101_BN_SCALE_Merged_OHEM on my own dataset. Some of the output losses (loss_bbox and loss_cls) are always 0.

Update: seems there are something wrong with OHEM. When I turn off OHEM everything is normal.

I1019 22:34:34.436921 14581 solver.cpp:229] Iteration 760, loss = 0.0427504
I1019 22:34:34.436954 14581 solver.cpp:245]     Train net output #0: loss_bbox = 0 (* 1 = 0 loss)
I1019 22:34:34.436959 14581 solver.cpp:245]     Train net output #1: loss_cls = 0 (* 1 = 0 loss)
I1019 22:34:34.436962 14581 solver.cpp:245]     Train net output #2: rpn_cls_loss = 0.0208707 (* 1 = 0.0208707 loss)
I1019 22:34:34.436965 14581 solver.cpp:245]     Train net output #3: rpn_loss_bbox = 0.00372629 (* 1 = 0.00372629 loss)

The output with OHEM turned off:

I1020 14:29:00.407395 19371 solver.cpp:245]     Train net output #0: loss_bbox = 0.652186 (* 1 = 0.652186 loss)
I1020 14:29:00.407400 19371 solver.cpp:245]     Train net output #1: loss_cls = 0.654309 (* 1 = 0.654309 loss)
I1020 14:29:00.407404 19371 solver.cpp:245]     Train net output #2: rpn_cls_loss = 0.113032 (* 1 = 0.113032 loss)
I1020 14:29:00.407408 19371 solver.cpp:245]     Train net output #3: rpn_loss_bbox = 0.0568502 (* 1 = 0.0568502 loss)

askerlee avatar Oct 19 '17 14:10 askerlee

@askerlee @Eniac-Xie Could you please release your "ResNet101_BN_SCALE_Merged_OHEM" model files included test.prototxt,my download files not contain it,but i have no time to write it because of an emergency.Thank you very much!!!

whmin avatar Nov 08 '17 13:11 whmin

Just copy the test.prototxt from the ResNet101_BN_SCALE_Merged folder. They are the same (hard example mining only happens in training, so the test model is the same).

askerlee avatar Nov 09 '17 04:11 askerlee

Ok,thank you!!!Now i got an error when run ./experiments/scripts/faster_rcnn_end2end.sh 0 ResNet-50 pascal_voc,like this: screenshot from 2017-11-08 21-08-43 i did not change the original code about resnet-50 with ohem,but only replace the "num_classes" and "num_output",i can not solve it,could you help me?

whmin avatar Nov 09 '17 04:11 whmin

change cls_prob[i,label] to cls_prob[i,int(label)] in lib/roi_data_layer/layer.py:242.

askerlee avatar Nov 09 '17 05:11 askerlee

Hi, I have encountered the same problem. Have you solved it?I think it may be the problem of the code. When I used the OHEM code modified by myself to train the author's prototxt file, the loss was not 0, but it was difficult to converge (which did not exist on VGG16).

oysz2016 avatar Jan 14 '19 07:01 oysz2016