Boxiang Wang

Results 30 comments of Boxiang Wang

I think you are talking about a different issue. So this `self.config.max_position_embeddings` is defined here in https://github.com/NVIDIA/Megatron-LM/blob/00efe37a85194a521789778ae47299ce8c054dc0/megatron/core/transformer/transformer_config.py#L1125 I think YaRN only need this `original_max_position_embeddings` for its computation

Thanks for the feedbacks. In this case, I can change the naming of https://github.com/NVIDIA/Megatron-LM/blob/00efe37a85194a521789778ae47299ce8c054dc0/megatron/core/transformer/transformer_config.py#L1125 into `original_max_position_embeddings` instead to avoid conflicts.

@yzlnew @lostkevin I am trying to understand the root cause of this issue a little bit more, could you share a simple reproduce code for this issue? Thanks!

https://github.com/NVIDIA/Megatron-LM/blob/76144fe1106e4fb0e69aa75b7a6ab66e71e8f37f/megatron/core/transformer/transformer_config.py#L1288 is the fix. We will deprecate the old `max_position_embeddings` config in next release 25.09

https://github.com/NVIDIA/Megatron-LM/commit/cb6ab12c49abfb767d82e7b07b57f16163e5d2e2 is merged into main and MCore 0.14.0 release (NeMo 24.09 container) for completely deprecating `max_position_embeddings` in MLA. Closing this issue, please feel free to re-open

Hi, we are targeting support YaRN and other long context feature next release, which is NeMo 25.09. Currently it is not supported

TE's https://github.com/NVIDIA/TransformerEngine/pull/2195 (2.9.0) is needed for this PR