CrossStagePartialNetworks icon indicating copy to clipboard operation
CrossStagePartialNetworks copied to clipboard

Bad inference performance with CSPResNeXt50-PANet-SPP

Open ekarabulut opened this issue 4 years ago • 8 comments

Hi,

I've been inspecting CSPResNeXt50-PANet-SPP for human detection in real-time. According to readme file of this repository, CSPResNeXt50-PANet-SPP performs better than Yolov3 in AP on COCO dataset.

In order to verify this result, I downloaded cfg and weights of CSPResNeXt50-PANet-SPP to compare it with Yolov3 (yolov3.cfg + yolov3.weights - result of COCO training).

As far as I could observe, CSPResNeXt50-PANet-SPP is not better than Yolov3 at least for my case of detecting humans in video streams. Here is an example image of results of both networks:

  1. Inference result with CSPResNeXt50-PANet-SPP: CSPResNeXt50-PANet-SPP_detection

  2. Inference result with Yolov3: yolov3_detection

My question is that whether these images represent a special case where CSPResNeXt50-PANet-SPP may perform worse than Yolov3? For instance, maybe for small objects like humans in the given images? Or what is the best way to explain this status?

Thanks in advance.

ekarabulut avatar Jan 08 '20 19:01 ekarabulut

You have provided too little information. Also you didn't give source image for reproducing this issue. May be you use CSPResNeXt50-PANet-SPP and Yolov3 with different network resolution. Or you are doing something wrong.

  1. attach source image
  2. Show result of detection by using cfg https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/csresnext50-panet-spp-original-optimal.cfg weights https://drive.google.com/open?id=1_NnfVgj0EDtb_WLNoXV8Mo7WKgwdYZCc

AlexeyAB avatar Jan 08 '20 21:01 AlexeyAB

the most possible situation is that the default input size of CSPResNeXt50-PANet-SPP is 416 and the default input size of YOLOv3 https://pjreddie.com/darknet/yolo/ is 608 if you just download cfg/weights and test them.

Also, please note that the big bounding box is correct detection since coco dataset will labelled it with iscrowd tag. https://github.com/ultralytics/yolov3/issues/714#issuecomment-565570001 https://github.com/ultralytics/yolov3/issues/714#issuecomment-565657113

WongKinYiu avatar Jan 08 '20 23:01 WongKinYiu

Hi @AlexeyAB and @WongKinYiu

As @AlexeyAB suggested, I downloaded the cfg and weights file from the links provided in the post. I used a video for inference and that picture is from the first second of that video. The command I used was: ./darknet detector demo data/coco.data csresnext50-panet-spp-original-optimal.cfg csresnext50-panet-spp-original-optimal_final.weights sys6_day.mp4 -out_filename sys6_day_predictions.mp4 -dont_show

The source video is at this URL: https://drive.google.com/open?id=1vquc2v9jpA5WkOEkvsNqMGCb-5XI7GXw

As a result of this inference, the same frame now looks like this: Screenshot from 2020-01-09 21-55-37

According to this result, Yolov3 inference still looks like better. To me, it is better in most parts of the video not just for this example frame. Any ideas?

ekarabulut avatar Jan 09 '20 19:01 ekarabulut

@ekarabulut

Try to run detection with flag -thresh 0.1

./darknet detector demo cfg/coco.data cfg/csresnext50-panet-spp-original-optimal.cfg csresnext50-panet-spp-original-optimal_final.weights sys6_day.mp4 -thresh 0.1


According to this result, Yolov3 inference still looks like better. To me, it is better in most parts of the video not just for this example frame. Any ideas?

It depends on dataset. On MS COCO the model csresnext50-panet-spp-original-optimal.cfg works better than yolov3.cfg, on AP50, AP, APsmall and especially for persons-class:

  • yolov3.cfg: yolov3_person

  • csresnext50-panet-spp-original-optimal.cfg csp_pan_person


  • yolov3.cfg: AP50=0.579, AP=0.330, APsmall=0.183
  • csresnext50-panet-spp.cfg: AP50=0.606, AP=0.384, APsmall=0.221
  • csresnext50-panet-spp-original-optimal.cfg: AP50=0.644, AP=0.424, APsmall=0.232

AlexeyAB avatar Jan 09 '20 20:01 AlexeyAB

@WongKinYiu Did you use darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 608 -height 608 for recalculating anchors for csresnext50-panet-spp-original-optimal.cfg ? Did you change achors manually?

AlexeyAB avatar Jan 09 '20 20:01 AlexeyAB

@AlexeyAB

No, I just use [original_anchors*512/416].

WongKinYiu avatar Jan 09 '20 22:01 WongKinYiu

@ekarabulut hello,

the reason of https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/7#issuecomment-572707936 is becuz it use larger anchors, so u need use larger input size to detect small objects.

and the reason of https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/7#issue-547045138 is mainly the input size are different. but there are also an another important reason: yolov3 is trained with letter_box while cspresnext50-panet-spp is trained with resize and images in coco usually have larger height than width, so use -letter_box can solve the problem. (The best solution is re-train cspresnext50-panet-spp with letter_box=1.)

after change input size to fit anchor and test with -letter_box, csresnext50-panet-spp-original-optimal gets following results: predictions

Compare with Yolov3: yolov3_detection

WongKinYiu avatar Jan 10 '20 00:01 WongKinYiu