Chen Pengguang comments

Results 20 comments of


                                            Chen Pengguang

关于"distilling from teacher’s higher levels adversely affects training of the student."的思考

对，这里的higher level确实应该是分辨率来定义的，在写paper的时候想的是以stage为单位来定义网络的深度的，一般来说一个stage就是同一个分辨率的网络模块，所以higher level指的是stage更多的，而不是卷积层更多的，确实存在一定歧义，感谢指出！关于ABF和HCL的作用，按我的经验来说，确实不同数据集上面蒸馏方式的work程度是不一样的，所以有这样的结论也是合理的。

A new re-implementation for KnowledgeReview

Thanks for your reimplementation! > By the way, the accuracy of the implementation in this paper for cifar100 is too low. Did you refer to the implementation of other warehouses...

Will KL-Divergence loss further improve the performance?

Sorry for the late reply. I didn't try KL loss. In my opinion, KL loss is good at handling 1d logits instead of large 2d-features.

preact setting

Thank you for this question! Actually, we didn't pay much attention to this part. I'm not sure whether the performance will be higher if we align these positions. You are...

Reproduced ImageNet result with torchdistill and questions about your baselines

Hi, Thank you for being interested in our work. For KD, we apply it to the classification branch. And the misalignment between teacher's and student's proposal do exist. We directly...

Reproduced ImageNet result with torchdistill and questions about your baselines

The process should be: ``` input -> student backbone -> student's proposal | v input -> teacher backbone -> ROI Align -> teacher's output ``` We didn't follow the original...

Reproduced ImageNet result with torchdistill and questions about your baselines

I'm not sure about the reasons why previous works use FitNet this way. I guess the main reason is that the one-stage training is easier to implement. And the results...

Have you tested the size of the student(AFB) model? Can the student model still be smaller than the original teacher model?

No. But other modules used in the distillation are not used during the inference. So the student model is not changed when we really use it.

shapes and out_shapes values in ReviewKD

It depends on the shape of features you used for distillation.

apply it on yolox

Sorry for the late reply. I never try it on yolox. But usually, you should transform the student feature to match the shape of the teacher's.