acnn attention first or convolution first

attention first or convolution first

Open GatsbyUSTC opened this issue 6 years ago • 0 comments

Hi, my implementation is similar as yours. In input attention layer, I did convolution of kernel size 3 first and then multiply with the attention. I didn't see a mathematical difference between this version and the sliding window version. What's your opinion on it?

Nov 29 '17 09:11 GatsbyUSTC

acnn acnn copied to clipboard

attention first or convolution first

acnn
acnn copied to clipboard