Chen Pengguang
Chen Pengguang
对,这里的higher level确实应该是分辨率来定义的,在写paper的时候想的是以stage为单位来定义网络的深度的,一般来说一个stage就是同一个分辨率的网络模块,所以higher level指的是stage更多的,而不是卷积层更多的,确实存在一定歧义,感谢指出! 关于ABF和HCL的作用,按我的经验来说,确实不同数据集上面蒸馏方式的work程度是不一样的,所以有这样的结论也是合理的。
Thanks for your reimplementation! > By the way, the accuracy of the implementation in this paper for cifar100 is too low. Did you refer to the implementation of other warehouses...
Sorry for the late reply. I didn't try KL loss. In my opinion, KL loss is good at handling 1d logits instead of large 2d-features.
Thank you for this question! Actually, we didn't pay much attention to this part. I'm not sure whether the performance will be higher if we align these positions. You are...
Hi, Thank you for being interested in our work. For KD, we apply it to the classification branch. And the misalignment between teacher's and student's proposal do exist. We directly...
The process should be: ``` input -> student backbone -> student's proposal | v input -> teacher backbone -> ROI Align -> teacher's output ``` We didn't follow the original...
I'm not sure about the reasons why previous works use FitNet this way. I guess the main reason is that the one-stage training is easier to implement. And the results...
No. But other modules used in the distillation are not used during the inference. So the student model is not changed when we really use it.
It depends on the shape of features you used for distillation.
Sorry for the late reply. I never try it on yolox. But usually, you should transform the student feature to match the shape of the teacher's.