do-you-even-need-attention Interaction between patches through a transpose may have a stronger role to play ?

Interaction between patches through a transpose may have a stronger role to play ?

Open rakshith291 opened this issue 3 years ago • 0 comments

Hi, I was going through your exp report. You have made a point that since you are able to get a good performance without using attention layer so good performance of ViT may be more to do with it's embedding layer than attention .

But I believe, It's also may be to do with how you have established an interaction between patches through a transpose very similar to what was done in MLP-Mixer .

Would love to know your thoughts on this ?

Jun 27 '21 04:06 rakshith291

do-you-even-need-attention do-you-even-need-attention copied to clipboard

Interaction between patches through a transpose may have a stronger role to play ?

do-you-even-need-attention
do-you-even-need-attention copied to clipboard