Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

[BUG] Unnecessary initialization for router in megatron-core

Open haolin-nju opened this issue 1 year ago • 2 comments

Describe the bug In megatron/core/transformer/moe/router.py, the class Router, as well as TopKRouter, will always perform weight initialization. However, there exists cases that initialization is unnecessary, such as converting checkpoints.

Proposed fix Please kindly refer to PR914.

haolin-nju avatar Jul 09 '24 09:07 haolin-nju

Thanks for reporting and fixing it. Unfortunately, we can't directly merge this PR on Github, but we'll include the fix in the next version.

yanring avatar Jul 16 '24 13:07 yanring

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Sep 14 '24 18:09 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Aug 02 '25 02:08 github-actions[bot]

Thank you for submitting the issue and PR! Closing as fixed since it's already in the codebase.

sbhavani avatar Oct 05 '25 15:10 sbhavani