MultiheadAttention Usage in Decoder

Open gnillling opened this issue 1 year ago • 2 comments

I really appreciate your work, but I encountered some questions while reviewing the code. In the figure of the paper, the output of the first multihead attention in the decoder is fed into the second multihead attention. However, I couldn't find this implementation in the code.

Feb 28 '25 13:02 gnillling