Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

fix bugs for multi_latent_attention

Open xqiangx1991 opened this issue 4 months ago • 0 comments

  1. The initialization parameters for DotProductAttention and TEDotProductAttention are different. Using DotProductAttention to construct MultiLatentAttention will result in an error.
  2. In the MLASelfAttention module, the dimensions of k_pos_emb and k_no_pe differ in terms of num_attention_heads.torch.cat will result in an error.

xqiangx1991 avatar Oct 09 '24 06:10 xqiangx1991