EfficientFormer
EfficientFormer copied to clipboard
Accuracy from distillation?
Thanks for the great work. One question regarding accuracy: are the numbers from distillation or not? The release model has distillation head so I wonder what is the number without distillation? Usually distillation can give 2-3 point boost.
77.1 from my side.
I'm getting 77.8 for L1 from my own implementation without distillation. By tuning the hyperparameters I'm expecting 78+ but it is still far from 79.2 as reported in the paper. From the code, distillation is used so I assume 79.2 is from distillation. Please clarify on this. Thanks.
79.2 for L1 is trained w/ distillation for 300 epochs.
We haven't trained the model w/o distillation, but can try it later.
Thanks for the clarification! That makes more sense now.