FocalLoss How about the result with SSD?

I'm glad to see your work with focal loss, have you got some better performance with focal loss than ohem in ssd? Moreover, have you test focal loss with your another work MobileNet-SSD?Thanks!

Aug 31 '17 10:08 bailvwangzi

I tested the solution. loss computing may has errors. loss drops fast first, and after few hundreds of iters, loss will getting bigger and bigger.

Sep 01 '17 10:09 mychina75

@mychina75 I found the error and corrected it, for verification I checked the gradient by check_focal_diff.py. Sorry for my fault, please check out the new code and test. @bailvwangzi I tested it on Mobilenet-SSD for 30000 iterations, and the mAP~0.717, a little droped. Now I'm training with some other gamma values, hope to get better performance.

Sep 02 '17 14:09 chuanqi305

loss can steady reduce now, but the evaluation result of model getting worse... ############## Line 7015: I0904 13:14:38.944118 13639 solver.cpp:546] Test net output #0: detection_eval = 0.0882107 Line 8234: I0904 14:54:06.567481 13639 solver.cpp:546] Test net output #0: detection_eval = 0.0710198 Line 9455: I0904 16:33:22.908924 13639 solver.cpp:546] Test net output #0: detection_eval = 0.0636553

Sep 05 '17 01:09 mychina75

@mychina75 What's the final loss value? In my test the evaluation is OK.

Sep 05 '17 01:09 chuanqi305

@chuanqi305 I test your focal loss with SSD, not mobilenet-ssd. I merge your code and change mining_type to NONE. The final loss can decrease to 0.3 , but the detection_eval = 74% worse than ohem 77%. Do you have other training tricks?

Sep 05 '17 10:09 bailvwangzi

@bailvwangzi No, I did not get a higher mAP, too. Just the same as OHEM.

Sep 06 '17 00:09 chuanqi305

@chuanqi305 I trained the model on COCO 80, looks like model getting worse. but loss values normal... never met with this before, so I stopped training early. any clue about this?

I0904 13:14:08.266741 13639 solver.cpp:433] Iteration 2000, Testing net (#0) I0904 13:14:08.304054 13639 net.cpp:693] Ignoring source layer mbox_loss W0904 13:14:38.931223 13639 solver.cpp:524] Missing true_pos for label: 30 W0904 13:14:38.931674 13639 solver.cpp:524] Missing true_pos for label: 32 W0904 13:14:38.932234 13639 solver.cpp:524] Missing true_pos for label: 35 W0904 13:14:38.934804 13639 solver.cpp:524] Missing true_pos for label: 43 W0904 13:14:38.941249 13639 solver.cpp:524] Missing true_pos for label: 65 W0904 13:14:38.941372 13639 solver.cpp:524] Missing true_pos for label: 71 W0904 13:14:38.944011 13639 solver.cpp:524] Missing true_pos for label: 77 W0904 13:14:38.944099 13639 solver.cpp:524] Missing true_pos for label: 79 W0904 13:14:38.944110 13639 solver.cpp:524] Missing true_pos for label: 80 I0904 13:14:38.944118 13639 solver.cpp:546] Test net output #0: detection_eval = 0.0882107 I0904 13:14:41.769762 13639 solver.cpp:243] Iteration 2000, loss = 2.7271 I0904 13:14:41.769806 13639 solver.cpp:259] Train net output #0: mbox_loss = 3.15971 (* 1 = 3.15971 loss) ######## I0904 14:53:46.069953 13639 solver.cpp:433] Iteration 4000, Testing net (#0) I0904 14:53:46.070041 13639 net.cpp:693] Ignoring source layer mbox_loss I0904 14:53:46.254638 13639 blocking_queue.cpp:50] Data layer prefetch queue empty W0904 14:54:06.554822 13639 solver.cpp:524] Missing true_pos for label: 13 W0904 14:54:06.556159 13639 solver.cpp:524] Missing true_pos for label: 30 W0904 14:54:06.556406 13639 solver.cpp:524] Missing true_pos for label: 32 W0904 14:54:06.556752 13639 solver.cpp:524] Missing true_pos for label: 35 W0904 14:54:06.560189 13639 solver.cpp:524] Missing true_pos for label: 43 W0904 14:54:06.560204 13639 solver.cpp:524] Missing true_pos for label: 44 W0904 14:54:06.560220 13639 solver.cpp:524] Missing true_pos for label: 45 W0904 14:54:06.566228 13639 solver.cpp:524] Missing true_pos for label: 65 W0904 14:54:06.566258 13639 solver.cpp:524] Missing true_pos for label: 69 W0904 14:54:06.566284 13639 solver.cpp:524] Missing true_pos for label: 71 W0904 14:54:06.567383 13639 solver.cpp:524] Missing true_pos for label: 77 W0904 14:54:06.567459 13639 solver.cpp:524] Missing true_pos for label: 79 W0904 14:54:06.567472 13639 solver.cpp:524] Missing true_pos for label: 80 I0904 14:54:06.567481 13639 solver.cpp:546] Test net output #0: detection_eval = 0.0710198 I0904 14:54:09.375293 13639 solver.cpp:243] Iteration 4000, loss = 2.80046 I0904 14:54:09.375329 13639 solver.cpp:259] Train net output #0: mbox_loss = 2.84744 (* 1 = 2.84744 loss) ######## I0904 16:33:02.799624 13639 solver.cpp:433] Iteration 6000, Testing net (#0) I0904 16:33:02.799713 13639 net.cpp:693] Ignoring source layer mbox_loss I0904 16:33:05.091681 13639 blocking_queue.cpp:50] Data layer prefetch queue empty W0904 16:33:22.895887 13639 solver.cpp:524] Missing true_pos for label: 13 W0904 16:33:22.897629 13639 solver.cpp:524] Missing true_pos for label: 25 W0904 16:33:22.897707 13639 solver.cpp:524] Missing true_pos for label: 30 W0904 16:33:22.897748 13639 solver.cpp:524] Missing true_pos for label: 32 W0904 16:33:22.897907 13639 solver.cpp:524] Missing true_pos for label: 35 W0904 16:33:22.897919 13639 solver.cpp:524] Missing true_pos for label: 36 W0904 16:33:22.900342 13639 solver.cpp:524] Missing true_pos for label: 43 W0904 16:33:22.900359 13639 solver.cpp:524] Missing true_pos for label: 44 W0904 16:33:22.900367 13639 solver.cpp:524] Missing true_pos for label: 45 W0904 16:33:22.906970 13639 solver.cpp:524] Missing true_pos for label: 65 W0904 16:33:22.906996 13639 solver.cpp:524] Missing true_pos for label: 69 W0904 16:33:22.907037 13639 solver.cpp:524] Missing true_pos for label: 71 W0904 16:33:22.908805 13639 solver.cpp:524] Missing true_pos for label: 77 W0904 16:33:22.908903 13639 solver.cpp:524] Missing true_pos for label: 79 W0904 16:33:22.908915 13639 solver.cpp:524] Missing true_pos for label: 80 I0904 16:33:22.908924 13639 solver.cpp:546] Test net output #0: detection_eval = 0.0636553 I0904 16:33:25.634253 13639 solver.cpp:243] Iteration 6000, loss = 2.73545 I0904 16:33:25.634300 13639 solver.cpp:259] Train net output #0: mbox_loss = 2.85784 (* 1 = 2.85784 loss)

Sep 06 '17 01:09 mychina75

My implementation is almost the same as you besides some minor differences which are claimed in the paper. I shall test my function and investigate whether these differences are crucial or not

Sep 06 '17 13:09 XiongweiWu

@mychina75 Too few iterations, you should evaluate after iteration 30000~50000.

Sep 07 '17 00:09 chuanqi305

@XiongweiWu Can you talk about some details? In my test, the performance has not been improved, focal loss is not better than OHEM.

Sep 07 '17 00:09 chuanqi305

@chuanqi305 I0907 19:47:47.503582 40167 solver.cpp:243] Iteration 0, loss = 538.726 I0907 19:47:47.503639 40167 solver.cpp:259] Train net output #0: mbox_loss = 538.726 (* 1 = 538.726 loss) I0907 19:47:47.503693 40167 sgd_solver.cpp:138] Iteration 0, lr = 0.001 I0907 19:47:47.523170 40167 blocking_queue.cpp:50] Data layer prefetch queue empty I0907 19:47:59.226004 40167 solver.cpp:243] Iteration 10, loss = 488.884 I0907 19:47:59.226058 40167 solver.cpp:259] Train net output #0: mbox_loss = 418.708 (* 1 = 418.708 loss) I0907 19:47:59.226068 40167 sgd_solver.cpp:138] Iteration 10, lr = 0.001 I0907 19:48:11.308161 40167 solver.cpp:243] Iteration 20, loss = 412.334 I0907 19:48:11.308215 40167 solver.cpp:259] Train net output #0: mbox_loss = 393.423 (* 1 = 393.423 loss) I0907 19:48:11.308225 40167 sgd_solver.cpp:138] Iteration 20, lr = 0.001 I0907 19:48:24.216085 40167 solver.cpp:243] Iteration 30, loss = 426.297 I0907 19:48:24.216294 40167 solver.cpp:259] Train net output #0: mbox_loss = 269.242 (* 1 = 269.242 loss) I0907 19:48:24.216308 40167 sgd_solver.cpp:138] Iteration 30, lr = 0.001 I0907 19:48:36.642977 40167 solver.cpp:243] Iteration 40, loss = 449.73 I0907 19:48:36.643034 40167 solver.cpp:259] Train net output #0: mbox_loss = 424.498 (* 1 = 424.498 loss) I0907 19:48:36.643045 40167 sgd_solver.cpp:138] Iteration 40, lr = 0.001 I0907 19:48:49.470823 40167 solver.cpp:243] Iteration 50, loss = 520.721 I0907 19:48:49.470880 40167 solver.cpp:259] Train net output #0: mbox_loss = 450.236 (* 1 = 450.236 loss) I0907 19:48:49.470890 40167 sgd_solver.cpp:138] Iteration 50, lr = 0.001 I0907 19:49:01.526100 40167 solver.cpp:243] Iteration 60, loss = 470.837 I0907 19:49:01.526652 40167 solver.cpp:259] Train net output #0: mbox_loss = 504.9 (* 1 = 504.9 loss) I0907 19:49:01.526669 40167 sgd_solver.cpp:138] Iteration 60, lr = 0.001 I0907 19:49:15.080325 40167 solver.cpp:243] Iteration 70, loss = 441.191 I0907 19:49:15.080377 40167 solver.cpp:259] Train net output #0: mbox_loss = 343.061 (* 1 = 343.061 loss) I0907 19:49:15.080387 40167 sgd_solver.cpp:138] Iteration 70, lr = 0.001 I0907 19:49:27.861601 40167 solver.cpp:243] Iteration 80, loss = 416.44 I0907 19:49:27.861662 40167 solver.cpp:259] Train net output #0: mbox_loss = 524.938 (* 1 = 524.938 loss) I0907 19:49:27.861677 40167 sgd_solver.cpp:138] Iteration 80, lr = 0.001 I0907 19:49:40.567715 40167 solver.cpp:243] Iteration 90, loss = 419.763 I0907 19:49:40.568455 40167 solver.cpp:259] Train net output #0: mbox_loss = 485.486 (* 1 = 485.486 loss) I0907 19:49:40.568467 40167 sgd_solver.cpp:138] Iteration 90, lr = 0.001 I0907 19:49:52.489009 40167 solver.cpp:243] Iteration 100, loss = 496.385 I0907 19:49:52.489078 40167 solver.cpp:259] Train net output #0: mbox_loss = 598.885 (* 1 = 598.885 loss) I0907 19:49:52.489092 40167 sgd_solver.cpp:138] Iteration 100, lr = 0.001 I0907 19:50:04.454450 40167 solver.cpp:243] Iteration 110, loss = 440.035 I0907 19:50:04.454507 40167 solver.cpp:259] Train net output #0: mbox_loss = 552.493 (* 1 = 552.493 loss) I0907 19:50:04.454519 40167 sgd_solver.cpp:138] Iteration 110, lr = 0.001

is it normal ？

Sep 07 '17 11:09 jinxuan777

No, the loss should be < 10 after 10 iterations. Maybe there is a bug in your network structure?

Sep 08 '17 00:09 chuanqi305

@chuanqi305 Sorry for replying late. I did a series experiments on VOC07 with Fast RCNN, ZF backbone. The baseline is 57.1%. In your implementation, alpha is shared with all categories and only one K+1 classifiers is learned. The paper said K 2-class classifiers are trained and alpha is class-dependent. I use your code directly and achieve 53.3% mAP in my settings and when I replace all alpha to 1 the accuracy reaches 57.4%, slightly better than baseline. However, when I use all proposals to train, the performance reduce to 56.8%(worse than OHEM). The difficulty is the loss weight in bounding box regressor loss since we cannot use all samples to smooth. I will test in SSD today and hope you can also share some results

Sep 12 '17 06:09 XiongweiWu

@XiongweiWu Hi，I chaged a two-stage net RON(very similar to FPN) to one-stage net just like the paper did, and use all proposals to train.But my AP is too low. Do you have time to check my net ? 3q

Sep 18 '17 02:09 zhanglonghao1992

@bailvwangzi I'm training the nomal SSD and SSD with focal loss together and use ResNet101 as base line. The detection_eval of nomal SSD is 0.68 at iteration 10000, but the detection_eval of SSD with focal loss is just 0.45 at iteration 20000. It seems like that SSD with focal loss becomes very hard to train .Have you been through of this during training?

Sep 19 '17 08:09 zhanglonghao1992

@zhanglonghao1992 the same with you. I get up to 74 mAP after 18w iteration. To avoid the effect of initialization, I use normal SSD model(e.x. you can use normal SSD iteration 10000) as pre-trained model to finetune, it can converge faster.

Sep 19 '17 08:09 bailvwangzi

@bailvwangzi @zhanglonghao1992 hi, I just finish the ablation experiment on SSD with focal loss trained on VOC07 dataset. The performance of SSD is not as good as paper said ><. SSD's benchmark is 77.4% and 62% w or w\o data augmentation, while my result is 74.1% and 66% on focal loss. I remember the original paper said they remove all data augmentation tricks except mirror. I need more time to investigate, maybe the dataset, maybe the learning parameters, maybe the implementation(I think the implementation should be quite simple...)

Sep 19 '17 10:09 XiongweiWu

@XiongweiWu nice ablation work, thanks.Looking forward to your better result!

Sep 19 '17 11:09 bailvwangzi

@bailvwangzi Hi, my mAP is still 0.6 after 18w iters using SSD with focal loss..You said your mAP on 18w iter is 0.74? How you do that? Do you change the lr rate or use nomal SSD model to initialize the model?

Sep 25 '17 02:09 zhanglonghao1992

@chuanqi305 Hi ,I use your code on SSD with Resnet-101 but the final result is 0.6.. Do you change the lr rate or some other params? How about the pretrain model?

Sep 26 '17 01:09 zhanglonghao1992

@chuanqi305 ..When I use VGG16, lr_rate=0.001 will make loss=nan, but ResNet-101 is ok with 0.001. I have to set lr_rate=0.0001 to train VGG16 Why? Do i have to change alpha and gamma?

Sep 26 '17 02:09 zhanglonghao1992

@XiongweiWu Hi ,could you leave your qq or E-mail address? I got some troubles when training SSD with focal loss on VGG16 and ResNet-101

Sep 26 '17 02:09 zhanglonghao1992

@mychina75 The same with you..Have you sloved that?

Sep 26 '17 05:09 zhanglonghao1992

@zhanglonghao1992 no... I can not get better result.. maybe need to change some parameters?

Sep 26 '17 09:09 mychina75

@mychina75 It only happens when I use VGG16. This 'Missing true_pos for label' never appears when i use ResNet101. I dont know why

Sep 26 '17 10:09 zhanglonghao1992

@chuanqi305 Thank you very much for sharing your Focal Loss implementation. I tested your code and also found no improvement with respect to original SSD. maybe the focal loss is not the key factor for the retinaNet?

Oct 15 '17 03:10 pbdahzou

@pbdahzou Maybe the Focal Loss is similar to OHEM in the training effect. The retinaNet use FPN framework, maybe the key factor is 'Deconvolution'.

Oct 31 '17 14:10 chuanqi305

Has any one tried both kind of losses together - i.e. some thing like:

layer { name: "mbox_loss" type: "MultiBoxLoss" bottom: "mbox_loc" bottom: "mbox_conf" bottom: "mbox_priorbox" bottom: "label" top: "mbox_loss" include { phase: TRAIN } propagate_down: true propagate_down: true propagate_down: false propagate_down: false loss_param { normalization: VALID } loss_weight: 0.5 multibox_loss_param { loc_loss_type: SMOOTH_L1 conf_loss_type: SOFTMAX loc_weight: 0.5 num_classes: 21 share_location: true match_type: PER_PREDICTION overlap_threshold: 0.5 use_prior_for_matching: true background_label_id: 0 use_difficult_gt: true neg_pos_ratio: 3.0 neg_overlap: 0.5 code_type: CENTER_SIZE ignore_cross_boundary_bbox: false mining_type: MAX_NEGATIVE } }

layer { name: "mbox_focal_loss" type: "MultiBoxFocalLoss" #change the type bottom: "mbox_loc" bottom: "mbox_conf" bottom: "mbox_priorbox" bottom: "label" top: "mbox_focal_loss" include { phase: TRAIN } propagate_down: true propagate_down: true propagate_down: false propagate_down: false loss_param { normalization: VALID } loss_weight: 0.5 focal_loss_param { #set the alpha and gamma, default is alpha=0.25, gamma=2.0 alpha: 0.25 gamma: 2.0 } multibox_loss_param { loc_loss_type: SMOOTH_L1 conf_loss_type: SOFTMAX loc_weight: 1.0 num_classes: 21 share_location: true match_type: PER_PREDICTION overlap_threshold: 0.5 use_prior_for_matching: true background_label_id: 0 use_difficult_gt: true neg_pos_ratio: 3.0 neg_overlap: 0.5 code_type: CENTER_SIZE ignore_cross_boundary_bbox: false mining_type: NONE #do not use OHEM } }

Nov 24 '17 13:11 mathmanu

It seems this has implemented Softmax Focal Loss, where as the original paper RetinaNet paper descibed used of Sigmoid instead of Softmax to compute the p. (See equatioon 5 and the paragraph below that).

Also see this discussion. https://github.com/kuangliu/pytorch-retinanet/issues/6

Has any one tried Sigmod for the Focal loss layer?

Nov 24 '17 16:11 mathmanu

@mathmanu even worse.

Nov 26 '17 03:11 XiongweiWu

FocalLoss FocalLoss copied to clipboard

How about the result with SSD?

FocalLoss
FocalLoss copied to clipboard