Object-Detection-Knowledge-Distillation-ICLR2021 Concern about the performance

Concern about the performance

Open yarkable opened this issue 3 years ago • 4 comments

Hi, thanks for your great work. Following your supplementary materials, I use RetinaNet-ResNet101 to distill RetinaNet-ResNet50, and I plot mAP curve as follows, which can be seen that performance of KD method is higher than student, but it raise slowly in the later training process, and nearly no gain compared to student model. I wonder if it's a common phenomenon? Or there is sth wrong?

Sep 05 '21 05:09 yarkable

Hi yarkable Maybe you can try train detectors for more epochs. Usually, I find that KD performs much better for 2X times training (24 epochs).

Sep 06 '21 02:09 ArchipLab-LinfengZhang

Yep, I back to see your paper, and it seems that you use 2x schedule in all your experiments. Is it a common practice for distillation in object detection?

Sep 06 '21 13:09 yarkable

@yarkable Do you get the expected gain with 2X times training？

Dec 15 '21 11:12 lji72

@lji72 Haven't tried yet.

Dec 16 '21 03:12 yarkable

Object-Detection-Knowledge-Distillation-ICLR2021 Object-Detection-Knowledge-Distillation-ICLR2021 copied to clipboard

Concern about the performance

Object-Detection-Knowledge-Distillation-ICLR2021
Object-Detection-Knowledge-Distillation-ICLR2021 copied to clipboard