mmrazor icon indicating copy to clipboard operation
mmrazor copied to clipboard

why the accuracy of distilled student is lower than model of mmpretrain

Open sofpya opened this issue 1 year ago • 2 comments

  1. I have two trained models using the mmpretrain: Res2net, ShufflenetV1
  2. I employed the model and training weight of Res2net as the teacher and the model of shufflenetv1 as the student model in the mmrazor.
  3. But the accuracy of distilled student model (shufflenetv1) is lower than trained shufflenetv1 in mmpretrain

sofpya avatar Nov 01 '23 01:11 sofpya

I have same problem ,could you find the resolution?

youwenjing avatar Jan 23 '24 08:01 youwenjing

It should not, try other combinations of hyperparameters (check the paper associated with the method you use). Another explanation is that the architectures you use are too different from one another, several papers argue that when the teacher is too different from the student the distillation can reduce performance

Veccoy avatar May 07 '24 14:05 Veccoy