FPN
FPN copied to clipboard
Results on COCO 2014, with TFFRCNN baseline
Hi, I've been looking for a working tensorflow implementation of FPN for some time now, and I think that this actually works :).
I'm using TFFRCNN to establish a baseline (this repo also seams to be a direct port of that, but i could be mistaken?). First i tried traning on pascal voc 2007 and testing on pascal voc 2007. That, sadly, didn't give an increase in accuracy (TFFRCNN reported 0.7 mAP and this reported 0.698 mAP, both were trained for 160k iterations), but the RPN loss during training was really good, so that gave me hope :)
But the COCO dataset seams to be a better candidate for testing this, first, because this is what the authors of the FPN paper report on and second, because the COCO evaluation metrics is significantly more fine grained.
below are the result:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 = 0.17, 0.20, 0.03
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 = 0.34, 0.37, 0.03
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 = 0.16, 0.20, 0.04
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 = 0.03, 0.08, 0.05
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 = 0.18, 0.23, 0.05
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 = 0.29, 0.27, -0.02
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 = 0.19, 0.21, 0.02
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 = 0.27, 0.33, 0.06
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 = 0.27, 0.33, 0.06
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 = 0.05, 0.13, 0.08
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 = 0.30, 0.39, 0.09
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 = 0.47, 0.47, 0.00
Where the 3 numbers at the end are (TFFRCNN, FPN, difference between the two).
The paper uses a slightly different training and testing set (training_set + train35k for traning and minival for testing (minival is only 5k images, while val is 40k images)). But the relative differences between faster-rcnn and FPN are in the same ballpark. Were seeing large increases in performance for small instances (both AR and AP), which is exactly what FPN sets out to do. So congrats! @xmyqsh :). The only result thats worse that TFFRCNN is the large instances, but that maybe remedied by two thing. First I only used (P3 -P5) for the class/bbox heads (similar to the paper), but I see you now use P6 aswell. Second I accidentally used OHEM when training/testing TFFRCNN, and not for FPN, so the test is actually not completely fair to FPN.
I'm going to focus on implementing RoiAlign and attaching a Mask head, so we can maybe replicate the results of Mask R-CNN.
PS. @xmyqsh should I do a pull request so that we can all train/test on coco (I just took the coco dataset code from TFFRCNN and made a few changes to the training code, so that instances with no gt boxes in the traningset are handled)
@stillwalker1234 Welcome, I haven't implemented the coco related code currently. I haven't good GPU resource for training on coco efficiently, you could try to training coco using my implementation. By the way, I got 0.77+ mAP on voc07_test set with training on voc2007 + voc2012. By adding P2 and P6 should boost the performance a lot.
Also, anything improvement is welcome.