minGPT icon indicating copy to clipboard operation
minGPT copied to clipboard

Output of CausalSelfAttention

Open whchan05 opened this issue 11 months ago • 1 comments

It seems that the output of this block is simply reshaped from multiple heads. From the original "Attention is all you need" paper, it seems that there is another linear layer, W^O. Mind if I ask if this is intentional or an error? Thank you

whchan05 avatar Jul 06 '23 23:07 whchan05