CrossStagePartialNetworks
CrossStagePartialNetworks copied to clipboard
Bad inference performance with CSPResNeXt50-PANet-SPP
Hi,
I've been inspecting CSPResNeXt50-PANet-SPP for human detection in real-time. According to readme file of this repository, CSPResNeXt50-PANet-SPP performs better than Yolov3 in AP on COCO dataset.
In order to verify this result, I downloaded cfg and weights of CSPResNeXt50-PANet-SPP to compare it with Yolov3 (yolov3.cfg + yolov3.weights - result of COCO training).
As far as I could observe, CSPResNeXt50-PANet-SPP is not better than Yolov3 at least for my case of detecting humans in video streams. Here is an example image of results of both networks:
-
Inference result with CSPResNeXt50-PANet-SPP:
-
Inference result with Yolov3:
My question is that whether these images represent a special case where CSPResNeXt50-PANet-SPP may perform worse than Yolov3? For instance, maybe for small objects like humans in the given images? Or what is the best way to explain this status?
Thanks in advance.
You have provided too little information. Also you didn't give source image for reproducing this issue. May be you use CSPResNeXt50-PANet-SPP and Yolov3 with different network resolution. Or you are doing something wrong.
- attach source image
- Show result of detection by using cfg https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/csresnext50-panet-spp-original-optimal.cfg weights https://drive.google.com/open?id=1_NnfVgj0EDtb_WLNoXV8Mo7WKgwdYZCc
the most possible situation is that the default input size of CSPResNeXt50-PANet-SPP
is 416 and the default input size of YOLOv3
https://pjreddie.com/darknet/yolo/ is 608 if you just download cfg/weights and test them.
Also, please note that the big bounding box is correct detection since coco dataset will labelled it with iscrowd
tag. https://github.com/ultralytics/yolov3/issues/714#issuecomment-565570001 https://github.com/ultralytics/yolov3/issues/714#issuecomment-565657113
Hi @AlexeyAB and @WongKinYiu
As @AlexeyAB suggested, I downloaded the cfg and weights file from the links provided in the post. I used a video for inference and that picture is from the first second of that video. The command I used was:
./darknet detector demo data/coco.data csresnext50-panet-spp-original-optimal.cfg csresnext50-panet-spp-original-optimal_final.weights sys6_day.mp4 -out_filename sys6_day_predictions.mp4 -dont_show
The source video is at this URL: https://drive.google.com/open?id=1vquc2v9jpA5WkOEkvsNqMGCb-5XI7GXw
As a result of this inference, the same frame now looks like this:
According to this result, Yolov3 inference still looks like better. To me, it is better in most parts of the video not just for this example frame. Any ideas?
@ekarabulut
Try to run detection with flag -thresh 0.1
./darknet detector demo cfg/coco.data cfg/csresnext50-panet-spp-original-optimal.cfg csresnext50-panet-spp-original-optimal_final.weights sys6_day.mp4 -thresh 0.1
According to this result, Yolov3 inference still looks like better. To me, it is better in most parts of the video not just for this example frame. Any ideas?
It depends on dataset. On MS COCO the model csresnext50-panet-spp-original-optimal.cfg
works better than yolov3.cfg
, on AP50, AP, APsmall and especially for persons
-class:
-
yolov3.cfg
: -
csresnext50-panet-spp-original-optimal.cfg
-
yolov3.cfg
: AP50=0.579, AP=0.330, APsmall=0.183 -
csresnext50-panet-spp.cfg
: AP50=0.606, AP=0.384, APsmall=0.221 -
csresnext50-panet-spp-original-optimal.cfg
: AP50=0.644, AP=0.424, APsmall=0.232
@WongKinYiu Did you use darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 608 -height 608
for recalculating anchors for csresnext50-panet-spp-original-optimal.cfg
?
Did you change achors manually?
@AlexeyAB
No, I just use [original_anchors*512/416].
@ekarabulut hello,
the reason of https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/7#issuecomment-572707936 is becuz it use larger anchors, so u need use larger input size to detect small objects.
and the reason of https://github.com/WongKinYiu/CrossStagePartialNetworks/issues/7#issue-547045138 is mainly the input size are different.
but there are also an another important reason: yolov3 is trained with letter_box
while cspresnext50-panet-spp is trained with resize
and images in coco usually have larger height than width, so use -letter_box
can solve the problem.
(The best solution is re-train cspresnext50-panet-spp with letter_box=1
.)
after change input size to fit anchor and test with -letter_box
, csresnext50-panet-spp-original-optimal gets following results:
Compare with Yolov3:
2. weights drive.google.com/open?id=1_NnfVgj0EDtb_WLNoXV8Mo7WKgwdYZCc
I get 404 error for the weight: weights drive.google.com/open?id=1_NnfVgj0EDtb_WLNoXV8Mo7WKgwdYZCc