Diff Attention out_proj dimension fix

Open lhallee opened this issue 1 year ago • 1 comments

Subtle potential issue with self.out_proj = nn.Linear(embed_dim, embed_dim) working on attn.reshape(bsz, tgt_len, self.num_heads * 2 * self.head_dim in the rare case that embed_dim != self.num_heads * 2 * self.head_dim

Small fix in the init of the base and flash attention implementations.

Oct 11 '24 20:10 lhallee

@microsoft-github-policy-service agree

Oct 11 '24 20:10 lhallee