Distilling-Object-Detectors VGG 16 (Teacher) mAP is lower than the reported

We used the the pretrained weights of VGG16 whose link is drive link is shared and tested its accuracy using the test_net.py script but the mAP obtained is 66.06 while the reported mAP on teacher model is 70.1

The detailed output is mentioned below:

AP for aeroplane = 0.6706 AP for bicycle = 0.7505 AP for bird = 0.6550 AP for boat = 0.4441 AP for bottle = 0.4730 AP for bus = 0.7368 AP for car = 0.7654 AP for cat = 0.8224 AP for chair = 0.4576 AP for cow = 0.6890 AP for diningtable = 0.6088 AP for dog = 0.7978 AP for horse = 0.8124 AP for motorbike = 0.7429 AP for person = 0.7343 AP for pottedplant = 0.3794 AP for sheep = 0.5846 AP for sofa = 0.6520 AP for train = 0.7236 AP for tvmonitor = 0.7121 Mean AP = 0.6606

Please let us know if we are missing something. How can we exactly reproduce the exact mAP of the teacher model

Nov 14 '19 11:11 gauravkrnayak

Hi, have tested the student model?

Nov 22 '19 07:11 twangnh

Got the issue ! I was using pytorch version 1.0 which was giving lower accuracy. But using pytorch 0.4.0 gave accuracy as 69.96.

Still, it is a bit lower than the reported 70.1

The accuracy obtained on Tecaher model are as follows:

AP for aeroplane = 0.7351 AP for bicycle = 0.7643 AP for bird = 0.6778 AP for boat = 0.5463 AP for bottle = 0.5159 AP for bus = 0.7796 AP for car = 0.8451 AP for cat = 0.8250 AP for chair = 0.4686 AP for cow = 0.7732 AP for diningtable = 0.6278 AP for dog = 0.8030 AP for horse = 0.8194 AP for motorbike = 0.7446 AP for person = 0.7709 AP for pottedplant = 0.4509 AP for sheep = 0.6978 AP for sofa = 0.6503 AP for train = 0.7604 AP for tvmonitor = 0.7364 Mean AP = 0.6996

Nov 25 '19 05:11 gauravkrnayak

Sorry maybe the model is not exactly the 70.1 AP model

Nov 25 '19 07:11 twangnh

We used the the pretrained weights of VGG16 whose link is drive link is shared and tested its accuracy using the test_net.py script but the mAP obtained is 66.06 while the reported mAP on teacher model is 70.1

The detailed output is mentioned below:

AP for aeroplane = 0.6706 AP for bicycle = 0.7505 AP for bird = 0.6550 AP for boat = 0.4441 AP for bottle = 0.4730 AP for bus = 0.7368 AP for car = 0.7654 AP for cat = 0.8224 AP for chair = 0.4576 AP for cow = 0.6890 AP for diningtable = 0.6088 AP for dog = 0.7978 AP for horse = 0.8124 AP for motorbike = 0.7429 AP for person = 0.7343 AP for pottedplant = 0.3794 AP for sheep = 0.5846 AP for sofa = 0.6520 AP for train = 0.7236 AP for tvmonitor = 0.7121 Mean AP = 0.6606

Please let us know if we are missing something. How can we exactly reproduce the exact mAP of the teacher model

Using pytorch 1.2, I get the same result. However, the student model, using VGG11, get the 0.68 mAP as reported.

Jan 06 '20 09:01 raytrun

Sorry I cannot figure out where could possibly be wrong, I recommend you to directly train teacher and student model, vgg16-faster-rcnn with ~70mAP is easy to reproduce.

Jan 16 '20 10:01 twangnh

@twangnh @yuanli2333 : Hi ,Wang & Li:

I appreciate  you update  the installation  part  in   README  and   when   i read  the  code ,   i have  a  question   why the  output channel  is same with the  input  channel  if  the  the  stu_adap  function  is to  adapt  the  channel,  thanks  in advance.

lib/model/faster_rcnn/vgg11.py

Aug 15 '20 08:08 chumingqian

@twangnh @yuanli2333 : Hi ,Wang & Li:

I appreciate  you update  the installation  part  in   README  and   when   i read  the  code ,   i have  a  question   why the  output channel  is same with the  input  channel  if  the  the  stu_adap  function  is to  adapt  the  channel,  thanks  in advance.

lib/model/faster_rcnn/vgg11.py

Because in this layer, the features of student and teacher model have the same channel (512), but with different spatial size.

Aug 15 '20 08:08 yuanli2333

@yuanli2333 : Thanks for your quickly reply , but according to the
out_size = （n - k + 2p)/s + 1 , the default s = stride =1, and your k = 3, 2p = 2, after that out_size = input_size, seems the spatial size didn't change also.

Aug 15 '20 11:08 chumingqian

@yuanli2333 : Thanks for your quickly reply , but according to the out_size = （n - k + 2p)/s + 1 , the default s = stride =1, and your k = 3, 2p = 2, after that out_size = input_size, seems the spatial size didn't change also.

Right, the spatial size is the same for VGG16 and VGG11. But we still need one learnable layer to project/transform the teacher feature, the channels also are not exactly one-to-one mapping between teacher and student. Besides, not all the teacher and student models have the same size.

Aug 16 '20 07:08 yuanli2333

@yuanli2333 : Thanks for your quickly reply , but according to the out_size = （n - k + 2p)/s + 1 , the default s = stride =1, and your k = 3, 2p = 2, after that out_size = input_size, seems the spatial size didn't change also.

Right, the spatial size is the same for VGG16 and VGG11. But we still need one learnable layer to project/transform the teacher feature, the channels also are not exactly one-to-one mapping between teacher and student. Besides, not all the teacher and student models have the same size.

Great, appreciate your reply.

Aug 17 '20 00:08 chumingqian

Distilling-Object-Detectors Distilling-Object-Detectors copied to clipboard

VGG 16 (Teacher) mAP is lower than the reported

Distilling-Object-Detectors
Distilling-Object-Detectors copied to clipboard