[BUG] multi_latent_attention does not support apply_rope_fusion
Describe the bug
ValueError: multi_latent_attention does not support apply_rope_fusion. raise ValueError("multi_latent_attention does not support apply_rope_fusion.")
To Reproduce
MLA_ARGS=(
--multi-latent-attention
--qk-pos-emb-head-dim 64
--qk-head-dim 128
--q-lora-rank 1536
--kv-lora-rank 512
--v-head-dim 128
--qk-layernorm
)
...
torchrun ${DISTRIBUTED_ARGS[@]} pretrain_gpt.py \
${MODEL_ARGS[@]} \
${MLA_ARGS[@]} \
${MOE_ARGS[@]} \
${DATA_ARGS[@]} \
${TRAINING_ARGS[@]} \
${MODEL_PARALLEL_ARGS[@]} \
${LOGGING_ARGS[@]}
Expected behavior
the apply_rope_fusion should be set to False when validating arguments if multi_latent_attention is enabled.
Stack trace/logs
Environment (please complete the following information):
- Megatron-LM core_r0.11.0
- PyTorch version 2.2.0
- CUDA version 12.1
- NCCL version
Proposed fix If you have a proposal for how to fix the issue state it here or link to a PR.
Additional context
Thanks for flagging the issue—will add an assertion soon.
Marking as stale. No activity in 60 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.
Merged at https://github.com/NVIDIA/Megatron-LM/commit/9c1a53515582d826b82ac133de5bc7e0a0ce4142