ML_Decoder a question in the paper

a question in the paper

Open csEylLee opened this issue 2 years ago • 0 comments

In your paper, you mentioned that "In practice, the projection layer can transform the queries to any desired output, making the self-attention module redundant". But self-attention has softmax, which means that self-attention is non-linear in general, but the projection layer can only make linear transformations. I don't understand why you said it can transform the queries to any desired output.

Jul 21 '23 07:07 csEylLee

ML_Decoder ML_Decoder copied to clipboard

a question in the paper

ML_Decoder
ML_Decoder copied to clipboard