EfficientFormer Accuracy from distillation?

Accuracy from distillation?

Open fyangf opened this issue 2 years ago • 1 comments

Thanks for the great work. One question regarding accuracy: are the numbers from distillation or not? The release model has distillation head so I wonder what is the number without distillation? Usually distillation can give 2-3 point boost.

Jul 01 '22 20:07 fyangf

77.1 from my side.

Aug 13 '22 19:08 jizongFox

I'm getting 77.8 for L1 from my own implementation without distillation. By tuning the hyperparameters I'm expecting 78+ but it is still far from 79.2 as reported in the paper. From the code, distillation is used so I assume 79.2 is from distillation. Please clarify on this. Thanks.

Sep 15 '22 21:09 fyangf

79.2 for L1 is trained w/ distillation for 300 epochs.

We haven't trained the model w/o distillation, but can try it later.

Sep 15 '22 21:09 alanspike

Thanks for the clarification! That makes more sense now.

Sep 15 '22 21:09 fyangf

EfficientFormer EfficientFormer copied to clipboard

Accuracy from distillation?

EfficientFormer
EfficientFormer copied to clipboard