pppppM
pppppM
Sorry, it is because the released edition wasn’t the code used in our paper. But later we would reproduce the results in our paper with the released code. And we...
如果把 kl 的公式展开,和 student 无关的项省掉后(teacher 部分不影响梯度更新,只影响数值),就是代码里的形式
Thanks for your attention ! If you want to distill model in OpenMMLab related repos, could join the wechat group in README.md
Updated. @sunshiding @Chan-Sun
Updated @Nireil
Thanks for your attention ! The arg is deprecated. If you want to distill model in OpenMMLab related repos, could join the wechat group in README.md
> 使用train中的代码进行蒸馏训练,看日志loss开始会下降一点,后面一致在44.几震荡,感觉不收敛,是设置的不对吗 loss 数值大是正常的,因为在实现的过程中,kl 计算时把没有梯度的项给去掉了,所以会导致数值有些大
Thanks for your attention ! If you want to distill model in OpenMMLab related repos, could join the wechat group in README.md
Thanks for your attention ! If you want to distill model in OpenMMLab related repos, could join the wechat group in README.md
Thanks for your attention ! If you want to distill model in OpenMMLab related repos, could join the wechat group in README.md