Megatron-LM
Megatron-LM copied to clipboard
[QUESTION] rotary position embedding
hello, from the code https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/models/common/embeddings/rope_utils.py#L116
it shows when calling the _apply_rotary_pos_emb_bshd function, the behavior of MLA is different from normal GQA or MHA. The code shows for MLA, there are some extra actions to make the even dims to first half and odd dims to second half for the input Tensor. Can anyone offer some detail of the purpose for doing it? Thanks!
+1
Marking as stale. No activity in 60 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.