Boxiang Wang
Boxiang Wang
I think you are talking about a different issue. So this `self.config.max_position_embeddings` is defined here in https://github.com/NVIDIA/Megatron-LM/blob/00efe37a85194a521789778ae47299ce8c054dc0/megatron/core/transformer/transformer_config.py#L1125 I think YaRN only need this `original_max_position_embeddings` for its computation
Thanks for the feedbacks. In this case, I can change the naming of https://github.com/NVIDIA/Megatron-LM/blob/00efe37a85194a521789778ae47299ce8c054dc0/megatron/core/transformer/transformer_config.py#L1125 into `original_max_position_embeddings` instead to avoid conflicts.
@yzlnew @lostkevin I am trying to understand the root cause of this issue a little bit more, could you share a simple reproduce code for this issue? Thanks!
Hi, this issue should be fixed by 25.07 container
https://github.com/NVIDIA/Megatron-LM/blob/76144fe1106e4fb0e69aa75b7a6ab66e71e8f37f/megatron/core/transformer/transformer_config.py#L1288 is the fix. We will deprecate the old `max_position_embeddings` config in next release 25.09
https://github.com/NVIDIA/Megatron-LM/commit/cb6ab12c49abfb767d82e7b07b57f16163e5d2e2 is merged into main and MCore 0.14.0 release (NeMo 24.09 container) for completely deprecating `max_position_embeddings` in MLA. Closing this issue, please feel free to re-open
Hi, we are targeting support YaRN and other long context feature next release, which is NeMo 25.09. Currently it is not supported
TE's https://github.com/NVIDIA/TransformerEngine/pull/2195 (2.9.0) is needed for this PR
/ok to test 7917e68
/ok to test 495f58d