LD
LD copied to clipboard
About Adaptive Layer
I have some questions about adaptive layers when training KD.
- When you combined your KD method with other intermediate feature map KD methods, you had to use adaptive layers to upscale student feature maps. I wonder if these adaptive layers were trained with students, or if you just froze them? I've read a lot of papers and nothing written about this.
- These adaptive layers may sometimes distort the output feature map from student and also, they don't contribute to the inference process of student. So why do adaptive layers make KD training work effectively? I think they would make the mAP decrease.
Can you explain to me, please? Thank you very much.
Adaptive layer is used when student feature map and teacher feature map doesn't match. Many KD papers use FPN as learning target, and FPN layer mostly have the same feature map, thus no adaptive layer (Including ours). That's why we don't mention it
Oh, I see. In my work, I have to use adaptive layers because the number of channels between student and teacher doesn't equal, and I think that makes the mAP of student drop slightly.