[BUG] Unnecessary initialization for router in megatron-core
Describe the bug
In megatron/core/transformer/moe/router.py, the class Router, as well as TopKRouter, will always perform weight initialization. However, there exists cases that initialization is unnecessary, such as converting checkpoints.
Proposed fix Please kindly refer to PR914.
Thanks for reporting and fixing it. Unfortunately, we can't directly merge this PR on Github, but we'll include the fix in the next version.
Marking as stale. No activity in 60 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.
Thank you for submitting the issue and PR! Closing as fixed since it's already in the codebase.