FGD Why the mAP is lower after distillation in VOC?

Why the mAP is lower after distillation in VOC?

Open cmjkqyl opened this issue 1 year ago • 1 comments

I use retinanet to train and test on VOC dataset. I found that the accuracy of retinanet after training with FGD is even lower than the original retinanet. It seems that dark knowledge has had a negative impact on the network. A phenomenon related to this is that when I do not use pre-trained weights to initialize the student network, FGD can have a significant improvement in model accuracy (compared to the original retinanet that also turns off pre-training). The teacher network is rx101. The student network is r50. The configuration file and training log are as follows. I have modified train.py and detection_distiller.py according to MGD's file.

Can you give me some suggestions about that? Thanks！！

mmdet==2.18 mmcv-full==1.4.3

Jun 11 '23 12:06 cmjkqyl

Hi, when you don't use the pre-trained weight in the student model with FGD, do you obtain better results than you use the pre-trained weight? I doubt that pre-trained weights will have a negative impact on knowledge distillation.

Jul 10 '23 10:07 xiaobiaodu

FGD FGD copied to clipboard

Why the mAP is lower after distillation in VOC?

FGD
FGD copied to clipboard