Boxiang Wang comments

Results 30 comments of


                                            Boxiang Wang

[BUG] Dual meaning of `max_position_embeddings`, computing both embedding shape & yarn scaling base

I think you are talking about a different issue. So this `self.config.max_position_embeddings` is defined here in https://github.com/NVIDIA/Megatron-LM/blob/00efe37a85194a521789778ae47299ce8c054dc0/megatron/core/transformer/transformer_config.py#L1125 I think YaRN only need this `original_max_position_embeddings` for its computation

[BUG] Dual meaning of `max_position_embeddings`, computing both embedding shape & yarn scaling base

Thanks for the feedbacks. In this case, I can change the naming of https://github.com/NVIDIA/Megatron-LM/blob/00efe37a85194a521789778ae47299ce8c054dc0/megatron/core/transformer/transformer_config.py#L1125 into `original_max_position_embeddings` instead to avoid conflicts.

[BUG] Dual meaning of `max_position_embeddings`, computing both embedding shape & yarn scaling base

@yzlnew @lostkevin I am trying to understand the root cause of this issue a little bit more, could you share a simple reproduce code for this issue? Thanks!

[BUG] Dual meaning of `max_position_embeddings`, computing both embedding shape & yarn scaling base

Hi, this issue should be fixed by 25.07 container

[BUG] Dual meaning of `max_position_embeddings`, computing both embedding shape & yarn scaling base

https://github.com/NVIDIA/Megatron-LM/blob/76144fe1106e4fb0e69aa75b7a6ab66e71e8f37f/megatron/core/transformer/transformer_config.py#L1288 is the fix. We will deprecate the old `max_position_embeddings` config in next release 25.09

[BUG] Dual meaning of `max_position_embeddings`, computing both embedding shape & yarn scaling base

https://github.com/NVIDIA/Megatron-LM/commit/cb6ab12c49abfb767d82e7b07b57f16163e5d2e2 is merged into main and MCore 0.14.0 release (NeMo 24.09 container) for completely deprecating `max_position_embeddings` in MLA. Closing this issue, please feel free to re-open

Boxiang Wang

[BUG] Dual meaning of `max_position_embeddings`, computing both embedding shape & yarn scaling base

[BUG] Dual meaning of `max_position_embeddings`, computing both embedding shape & yarn scaling base

[BUG] Dual meaning of `max_position_embeddings`, computing both embedding shape & yarn scaling base

[BUG] Dual meaning of `max_position_embeddings`, computing both embedding shape & yarn scaling base

[BUG] Dual meaning of `max_position_embeddings`, computing both embedding shape & yarn scaling base

[BUG] Dual meaning of `max_position_embeddings`, computing both embedding shape & yarn scaling base

Is Qwen3 pretraining architectural features fully supported now?

MuonClip support (non-split version)

MuonClip support (non-split version)

MuonClip support (non-split version)