GatsbyUSTC

Results 1 issues of GatsbyUSTC

Hi, my implementation is similar as yours. In input attention layer, I did convolution of kernel size 3 first and then multiply with the attention. I didn't see a mathematical...