[QUESTION] rotary position embedding

Open bugm opened this issue 8 months ago • 1 comments

hello, from the code https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/models/common/embeddings/rope_utils.py#L116

it shows when calling the _apply_rotary_pos_emb_bshd function, the behavior of MLA is different from normal GQA or MHA. The code shows for MLA, there are some extra actions to make the even dims to first half and odd dims to second half for the input Tensor. Can anyone offer some detail of the purpose for doing it? Thanks!

Apr 18 '25 08:04 bugm

May 22 '25 02:05 jinqinn

Marking as stale. No activity in 60 days.

Jul 21 '25 18:07 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

Jul 29 '25 02:07 github-actions[bot]