CMT.pytorch
CMT.pytorch copied to clipboard
About the sequence of the last 1x1 conv and avgpooling
In your paper the last 1x1 conv is behind the avgpooling, but in this implementation the last 1x1 conv is before the avgpooling. Is this a new trick or a mistake?
Hi, these two implementions can achieve similar results. The fc-bn-swish-avgpool can be a little bit better (~0.1% top-1).