caffe-jacinto Support for RoiPooling Layer

Hello, I see there are Layers for SSD like OD models. Is there any plan to include RoiPooling layer to run RCNN - like models?

Mar 06 '19 07:03 aashish-kumar

Hi,

Can you explain more about your target appliation? Why do you think "RCNN-like models" will be better than "SSD like OD models"?

Mar 06 '19 07:03 mathmanu

"RCNN-like models" have better accuracy for small objects like joint prediction of pedestrian & cars. Also if we reduce the region proposals to < 50, they are as efficient as SSD. SSD has much higher number of mboxes for the same mAP so the performance on CPU is not great. On GPU, that will not be the case. I am looking for 20-30 fps level of performance. I am targeting faster-RNN at the moment.

Mar 06 '19 07:03 aashish-kumar

At this stage I am trying to understand the pros and cons of various meta architectures. So I have more questions than answers.

(1) If we compare FasterRCNN and RFCN, RFCN seems to be simpler, and yet comparable in accuracy. So isn't RFCN a better choice?

(2) Have you looked at RefineDet? (which is a variant of SSD) https://arxiv.org/pdf/1711.06897.pdf RefineDet seems to have higher overall AP as well as AP for small objects compared for SSD, FasterRCNN, R-RFCN and RetinaNet.

(3) RetinaNet is quite close - so isn't that a good choice as well?

Mar 06 '19 09:03 mathmanu

I am currently trying to run these OD on TDA2px EVM board(TI) at ~22 fps

I am fine with either of them(RCNN-based, 2 step methods). Both of them will require roiPooling layer.
Refinedet #bboxes= 16320. This will be big hit on performance since I am running the detection layer on dsp
I am yet to evaluate retinanet but I think it will comparable to SSD in terms of performance(variant).

Mar 06 '19 10:03 aashish-kumar

I have some more questions. Please bear with me.

Can I ask what is the performance and input resolution that you expect for SSD?
Could you explain why RefineDet will have more #bboxes compared to SSD?
In 2 step methods as well, the first RPN step will have a lot of bboxes. Why would these methods be faster than SSD?

Mar 06 '19 11:03 mathmanu

No problem. Here are the details:

~25-30 fps on 1024x512 Input resolution with 0.5-0.6 mAP.
I looked at the architecture. It shows higher number of bboxes. Typical increasing bboxes helps with mAP.
In RPN step we can prune bboxes to say 50. then the detection layer will only evaluate on those bboxes. In case of SSD-like architecture #bboxes is roughly aspect_ratios* (#anchor_locations)*(num_taps). The detection layer need to process all these bboxes.

Mar 06 '19 11:03 aashish-kumar

Regarding (3) above.

My question was - the RPN stage also has a large number of bboxes to process. i.e. create the boxes, then sort/select, nms and then anotehr sort/select.

Is that complexity smaller than that in the case of SSD? Why would that be?

Mar 06 '19 11:03 mathmanu

Please correct me if my understanding is not upto mark: class specific prediction is done only on RPN output boxes(~50) for faster-rcnn In case of SSD its done on all the boxes.

Here is an interesting read: http://openaccess.thecvf.com/content_cvpr_2017/papers/Huang_SpeedAccuracy_Trade-Offs_for_CVPR_2017_paper.pdf

Mar 06 '19 11:03 aashish-kumar

In the case of SSD/RetinaNet, score and bbox predictions are computed using regular convolution layer, which could be accelerated well. For two-stage methods, the ROI pooling and FCs needs to be computed each pruned boxes. The execution time of ROI pooling and FCs relatively higher than convolutions. You may need to consider these trade-offs while making decisions

Mar 06 '19 12:03 kumardesappan

Please see Table 2 in the RefineDet paper, where they compare several detectors. https://arxiv.org/pdf/1711.06897.pdf

RetinaNet500 seems to be competitive [34.4AP and 14.7AP(Small)] to RefineDet512 and also FasterRCNN by GRMI. It is also much better than SSD512.

Based on this I would conclude that RetinaNet500 is a good choice.

(RetinaNet800 has more complexity and those models with a + use multi scale test - so they are not apple to apple comparison).

Mar 13 '19 07:03 mathmanu

caffe-jacinto caffe-jacinto copied to clipboard

Support for RoiPooling Layer

caffe-jacinto
caffe-jacinto copied to clipboard