FasterNet icon indicating copy to clipboard operation
FasterNet copied to clipboard

Does using GELU or RELU have a critical impact on the performance of the T0 model?

Open MenSanYan opened this issue 2 years ago • 1 comments

I noticed that you use GELU in small models like T0 and T1 and RELU in larger models like T2, is this intentional or just an oversight?

MenSanYan avatar Mar 22 '23 09:03 MenSanYan

Hi, as we said in the ablation study of the paper, for the activation function, we empirically found that GELU fits FasterNet-T0/T1 models more efficiently than ReLU. It, however, becomes the opposite for FasterNetT2/S/M/L. We conjecture that GELU strengthens FasterNet-T0/T1 by having higher non-linearity, while the benefit fades away for larger FasterNet variants.

JierunChen avatar Mar 22 '23 10:03 JierunChen