FasterNet
FasterNet copied to clipboard
Does using GELU or RELU have a critical impact on the performance of the T0 model?
I noticed that you use GELU in small models like T0 and T1 and RELU in larger models like T2, is this intentional or just an oversight?