Megatron-LM
Megatron-LM copied to clipboard
fix bugs for multi_latent_attention
- The initialization parameters for
DotProductAttention
andTEDotProductAttention
are different. Using DotProductAttention to construct MultiLatentAttention will result in an error. - In the
MLASelfAttention
module, the dimensions ofk_pos_emb
andk_no_pe
differ in terms of num_attention_heads.torch.cat
will result in an error.