Megatron-LM
Megatron-LM copied to clipboard

Published 20 hours ago •

Reame
Issues

fix bugs for multi_latent_attention

Open xqiangx1991 opened this issue 4 months ago • 0 comments

The initialization parameters for DotProductAttention and TEDotProductAttention are different. Using DotProductAttention to construct MultiLatentAttention will result in an error.
In the MLASelfAttention module, the dimensions of k_pos_emb and k_no_pe differ in terms of num_attention_heads.torch.cat will result in an error.

Oct 09 '24 06:10 xqiangx1991