Distilling-Object-Detectors
Distilling-Object-Detectors copied to clipboard
VGG 16 (Teacher) mAP is lower than the reported
We used the the pretrained weights of VGG16 whose link is drive link is shared and tested its accuracy using the test_net.py script but the mAP obtained is 66.06 while the reported mAP on teacher model is 70.1
The detailed output is mentioned below:
AP for aeroplane = 0.6706 AP for bicycle = 0.7505 AP for bird = 0.6550 AP for boat = 0.4441 AP for bottle = 0.4730 AP for bus = 0.7368 AP for car = 0.7654 AP for cat = 0.8224 AP for chair = 0.4576 AP for cow = 0.6890 AP for diningtable = 0.6088 AP for dog = 0.7978 AP for horse = 0.8124 AP for motorbike = 0.7429 AP for person = 0.7343 AP for pottedplant = 0.3794 AP for sheep = 0.5846 AP for sofa = 0.6520 AP for train = 0.7236 AP for tvmonitor = 0.7121 Mean AP = 0.6606
Please let us know if we are missing something. How can we exactly reproduce the exact mAP of the teacher model
Hi, have tested the student model?
Got the issue ! I was using pytorch version 1.0 which was giving lower accuracy. But using pytorch 0.4.0 gave accuracy as 69.96.
Still, it is a bit lower than the reported 70.1
The accuracy obtained on Tecaher model are as follows:
AP for aeroplane = 0.7351 AP for bicycle = 0.7643 AP for bird = 0.6778 AP for boat = 0.5463 AP for bottle = 0.5159 AP for bus = 0.7796 AP for car = 0.8451 AP for cat = 0.8250 AP for chair = 0.4686 AP for cow = 0.7732 AP for diningtable = 0.6278 AP for dog = 0.8030 AP for horse = 0.8194 AP for motorbike = 0.7446 AP for person = 0.7709 AP for pottedplant = 0.4509 AP for sheep = 0.6978 AP for sofa = 0.6503 AP for train = 0.7604 AP for tvmonitor = 0.7364 Mean AP = 0.6996
Sorry maybe the model is not exactly the 70.1 AP model
We used the the pretrained weights of VGG16 whose link is drive link is shared and tested its accuracy using the test_net.py script but the mAP obtained is 66.06 while the reported mAP on teacher model is 70.1
The detailed output is mentioned below:
AP for aeroplane = 0.6706 AP for bicycle = 0.7505 AP for bird = 0.6550 AP for boat = 0.4441 AP for bottle = 0.4730 AP for bus = 0.7368 AP for car = 0.7654 AP for cat = 0.8224 AP for chair = 0.4576 AP for cow = 0.6890 AP for diningtable = 0.6088 AP for dog = 0.7978 AP for horse = 0.8124 AP for motorbike = 0.7429 AP for person = 0.7343 AP for pottedplant = 0.3794 AP for sheep = 0.5846 AP for sofa = 0.6520 AP for train = 0.7236 AP for tvmonitor = 0.7121 Mean AP = 0.6606
Please let us know if we are missing something. How can we exactly reproduce the exact mAP of the teacher model
Using pytorch 1.2, I get the same result. However, the student model, using VGG11, get the 0.68 mAP as reported.
Sorry I cannot figure out where could possibly be wrong, I recommend you to directly train teacher and student model, vgg16-faster-rcnn with ~70mAP is easy to reproduce.
@twangnh @yuanli2333 : Hi ,Wang & Li:
I appreciate you update the installation part in README and when i read the code , i have a question why the output channel is same with the input channel if the the stu_adap function is to adapt the channel, thanks in advance.
lib/model/faster_rcnn/vgg11.py
@twangnh @yuanli2333 : Hi ,Wang & Li:
I appreciate you update the installation part in README and when i read the code , i have a question why the output channel is same with the input channel if the the stu_adap function is to adapt the channel, thanks in advance.
lib/model/faster_rcnn/vgg11.py
![]()
Because in this layer, the features of student and teacher model have the same channel (512), but with different spatial size.
@yuanli2333 :
Thanks for your quickly reply , but according to the
out_size = (n - k + 2p)/s + 1 , the default s = stride =1, and your k = 3, 2p = 2, after that out_size = input_size, seems the spatial size didn't change also.
@yuanli2333 : Thanks for your quickly reply , but according to the out_size = (n - k + 2p)/s + 1 , the default s = stride =1, and your k = 3, 2p = 2, after that out_size = input_size, seems the spatial size didn't change also.
Right, the spatial size is the same for VGG16 and VGG11. But we still need one learnable layer to project/transform the teacher feature, the channels also are not exactly one-to-one mapping between teacher and student. Besides, not all the teacher and student models have the same size.
@yuanli2333 : Thanks for your quickly reply , but according to the out_size = (n - k + 2p)/s + 1 , the default s = stride =1, and your k = 3, 2p = 2, after that out_size = input_size, seems the spatial size didn't change also.
Right, the spatial size is the same for VGG16 and VGG11. But we still need one learnable layer to project/transform the teacher feature, the channels also are not exactly one-to-one mapping between teacher and student. Besides, not all the teacher and student models have the same size.
Great, appreciate your reply.