mmrazor
mmrazor copied to clipboard
why the accuracy of distilled student is lower than model of mmpretrain
- I have two trained models using the mmpretrain: Res2net, ShufflenetV1
- I employed the model and training weight of Res2net as the teacher and the model of shufflenetv1 as the student model in the mmrazor.
- But the accuracy of distilled student model (shufflenetv1) is lower than trained shufflenetv1 in mmpretrain
I have same problem ,could you find the resolution?
It should not, try other combinations of hyperparameters (check the paper associated with the method you use). Another explanation is that the architectures you use are too different from one another, several papers argue that when the teacher is too different from the student the distillation can reduce performance