Megatron-LM [BUG] multi_latent_attention does not support apply_rope

Describe the bug

ValueError: multi_latent_attention does not support apply_rope_fusion. raise ValueError("multi_latent_attention does not support apply_rope_fusion.")

To Reproduce

MLA_ARGS=(
    --multi-latent-attention
    --qk-pos-emb-head-dim 64
    --qk-head-dim 128
    --q-lora-rank 1536
    --kv-lora-rank 512
    --v-head-dim 128
    --qk-layernorm
)
...
torchrun ${DISTRIBUTED_ARGS[@]} pretrain_gpt.py \
    ${MODEL_ARGS[@]} \
    ${MLA_ARGS[@]} \
    ${MOE_ARGS[@]} \
    ${DATA_ARGS[@]} \
    ${TRAINING_ARGS[@]} \
    ${MODEL_PARALLEL_ARGS[@]} \
    ${LOGGING_ARGS[@]}

Expected behavior the apply_rope_fusion should be set to False when validating arguments if multi_latent_attention is enabled.

Stack trace/logs

Environment (please complete the following information):

Megatron-LM core_r0.11.0
PyTorch version 2.2.0
CUDA version 12.1
NCCL version

Proposed fix If you have a proposal for how to fix the issue state it here or link to a PR.

Additional context

Feb 18 '25 02:02 AlbertZhangHIT

Thanks for flagging the issue—will add an assertion soon.

Mar 04 '25 17:03 yanring

Marking as stale. No activity in 60 days.

May 03 '25 18:05 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

Jul 29 '25 02:07 github-actions[bot]

Merged at https://github.com/NVIDIA/Megatron-LM/commit/9c1a53515582d826b82ac133de5bc7e0a0ce4142

Jul 29 '25 03:07 yanring

[BUG] multi_latent_attention does not support apply_rope_fusion