Distilling-Object-Detectors icon indicating copy to clipboard operation
Distilling-Object-Detectors copied to clipboard

VGG 16 (Teacher) mAP is lower than the reported

Open gauravkrnayak opened this issue 4 years ago • 10 comments

We used the the pretrained weights of VGG16 whose link is drive link is shared and tested its accuracy using the test_net.py script but the mAP obtained is 66.06 while the reported mAP on teacher model is 70.1

The detailed output is mentioned below:

AP for aeroplane = 0.6706 AP for bicycle = 0.7505 AP for bird = 0.6550 AP for boat = 0.4441 AP for bottle = 0.4730 AP for bus = 0.7368 AP for car = 0.7654 AP for cat = 0.8224 AP for chair = 0.4576 AP for cow = 0.6890 AP for diningtable = 0.6088 AP for dog = 0.7978 AP for horse = 0.8124 AP for motorbike = 0.7429 AP for person = 0.7343 AP for pottedplant = 0.3794 AP for sheep = 0.5846 AP for sofa = 0.6520 AP for train = 0.7236 AP for tvmonitor = 0.7121 Mean AP = 0.6606

Please let us know if we are missing something. How can we exactly reproduce the exact mAP of the teacher model

gauravkrnayak avatar Nov 14 '19 11:11 gauravkrnayak

Hi, have tested the student model?

twangnh avatar Nov 22 '19 07:11 twangnh

Got the issue ! I was using pytorch version 1.0 which was giving lower accuracy. But using pytorch 0.4.0 gave accuracy as 69.96.

Still, it is a bit lower than the reported 70.1

The accuracy obtained on Tecaher model are as follows:

AP for aeroplane = 0.7351 AP for bicycle = 0.7643 AP for bird = 0.6778 AP for boat = 0.5463 AP for bottle = 0.5159 AP for bus = 0.7796 AP for car = 0.8451 AP for cat = 0.8250 AP for chair = 0.4686 AP for cow = 0.7732 AP for diningtable = 0.6278 AP for dog = 0.8030 AP for horse = 0.8194 AP for motorbike = 0.7446 AP for person = 0.7709 AP for pottedplant = 0.4509 AP for sheep = 0.6978 AP for sofa = 0.6503 AP for train = 0.7604 AP for tvmonitor = 0.7364 Mean AP = 0.6996

gauravkrnayak avatar Nov 25 '19 05:11 gauravkrnayak

Sorry maybe the model is not exactly the 70.1 AP model

twangnh avatar Nov 25 '19 07:11 twangnh

We used the the pretrained weights of VGG16 whose link is drive link is shared and tested its accuracy using the test_net.py script but the mAP obtained is 66.06 while the reported mAP on teacher model is 70.1

The detailed output is mentioned below:

AP for aeroplane = 0.6706 AP for bicycle = 0.7505 AP for bird = 0.6550 AP for boat = 0.4441 AP for bottle = 0.4730 AP for bus = 0.7368 AP for car = 0.7654 AP for cat = 0.8224 AP for chair = 0.4576 AP for cow = 0.6890 AP for diningtable = 0.6088 AP for dog = 0.7978 AP for horse = 0.8124 AP for motorbike = 0.7429 AP for person = 0.7343 AP for pottedplant = 0.3794 AP for sheep = 0.5846 AP for sofa = 0.6520 AP for train = 0.7236 AP for tvmonitor = 0.7121 Mean AP = 0.6606

Please let us know if we are missing something. How can we exactly reproduce the exact mAP of the teacher model

Using pytorch 1.2, I get the same result. However, the student model, using VGG11, get the 0.68 mAP as reported.

raytrun avatar Jan 06 '20 09:01 raytrun

Sorry I cannot figure out where could possibly be wrong, I recommend you to directly train teacher and student model, vgg16-faster-rcnn with ~70mAP is easy to reproduce.

twangnh avatar Jan 16 '20 10:01 twangnh

@twangnh @yuanli2333 : Hi ,Wang & Li:

I appreciate  you update  the installation  part  in   README  and   when   i read  the  code ,   i have  a  question   why the  output channel  is same with the  input  channel  if  the  the  stu_adap  function  is to  adapt  the  channel,  thanks  in advance.

lib/model/faster_rcnn/vgg11.py

stu_adap

chumingqian avatar Aug 15 '20 08:08 chumingqian

@twangnh @yuanli2333 : Hi ,Wang & Li:

I appreciate  you update  the installation  part  in   README  and   when   i read  the  code ,   i have  a  question   why the  output channel  is same with the  input  channel  if  the  the  stu_adap  function  is to  adapt  the  channel,  thanks  in advance.

lib/model/faster_rcnn/vgg11.py

stu_adap

Because in this layer, the features of student and teacher model have the same channel (512), but with different spatial size.

yuanli2333 avatar Aug 15 '20 08:08 yuanli2333

@yuanli2333 : Thanks for your quickly reply , but according to the
out_size = (n - k + 2p)/s + 1 , the default s = stride =1, and your k = 3, 2p = 2, after that out_size = input_size, seems the spatial size didn't change also.

chumingqian avatar Aug 15 '20 11:08 chumingqian

@yuanli2333 : Thanks for your quickly reply , but according to the out_size = (n - k + 2p)/s + 1 , the default s = stride =1, and your k = 3, 2p = 2, after that out_size = input_size, seems the spatial size didn't change also.

Right, the spatial size is the same for VGG16 and VGG11. But we still need one learnable layer to project/transform the teacher feature, the channels also are not exactly one-to-one mapping between teacher and student. Besides, not all the teacher and student models have the same size.

yuanli2333 avatar Aug 16 '20 07:08 yuanli2333

@yuanli2333 : Thanks for your quickly reply , but according to the out_size = (n - k + 2p)/s + 1 , the default s = stride =1, and your k = 3, 2p = 2, after that out_size = input_size, seems the spatial size didn't change also.

Right, the spatial size is the same for VGG16 and VGG11. But we still need one learnable layer to project/transform the teacher feature, the channels also are not exactly one-to-one mapping between teacher and student. Besides, not all the teacher and student models have the same size.

Great, appreciate your reply.

chumingqian avatar Aug 17 '20 00:08 chumingqian