mmrazor why the accuracy of distilled student is lower than model of mmpretrain

why the accuracy of distilled student is lower than model of mmpretrain

Open sofpya opened this issue 1 year ago • 2 comments

I have two trained models using the mmpretrain: Res2net, ShufflenetV1
I employed the model and training weight of Res2net as the teacher and the model of shufflenetv1 as the student model in the mmrazor.
But the accuracy of distilled student model (shufflenetv1) is lower than trained shufflenetv1 in mmpretrain

Nov 01 '23 01:11 sofpya

I have same problem ,could you find the resolution?

Jan 23 '24 08:01 youwenjing

It should not, try other combinations of hyperparameters (check the paper associated with the method you use). Another explanation is that the architectures you use are too different from one another, several papers argue that when the teacher is too different from the student the distillation can reduce performance

May 07 '24 14:05 Veccoy

mmrazor mmrazor copied to clipboard

why the accuracy of distilled student is lower than model of mmpretrain

mmrazor
mmrazor copied to clipboard