pytorch-image-models
pytorch-image-models copied to clipboard
Swin-V2 as backbone for One-Stage Detector has low precision and recall for area==medium
I was using Swin-Transformerv2 for object detection as a backbone to a one stage detector. While the metrics are competitive, I am getting very low scores on [email protected]:0.95 because the Precision and Recall on area=medium is ridiculously low. Any insights on that? Does using Swin-Transformerv2 have a direct effect on this?
Swin-YOLO
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.589
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.747
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.703
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.125 <---------low
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.589
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.736
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.809
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.809
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.500 <---------low
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.809
YOLO
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.660
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.793
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.759
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.700 <---------high
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.661
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.795
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.842
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.842
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.700 <---------high
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.842
Hi, how could you create your swinv2, and how do you extract the feature from swinv2, such as my output size is torch.size([1, 1024]), but I want the size is torch.size([1, 1024, 20, 20])
please show me the code.
Thanks in advanced.