ZhiyuLi-Nvidia
ZhiyuLi-Nvidia
> [!IMPORTANT] > The `Update branch` button must only be pressed in very rare occassions. > An outdated branch is never blocking the merge of a PR. > Please reach...
> [!IMPORTANT] > The `Update branch` button must only be pressed in very rare occassions. > An outdated branch is never blocking the merge of a PR. > Please reach...
# What does this PR do ? Successful run after the fix with tp2 sq enabled in qwen model: https://wandb.ai/nvidia/grpo-dev-zhiyul/runs/nyq6n98w/overview?nw=nwuserzhiyul # Issues List issues that this PR closes ([syntax](https://docs.github.com/en/issues/tracking-your-work-with-issues/using-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword)): #...
[muon optimizer](https://github.com/KellerJordan/Muon?tab=readme-ov-file) has attracted lots of interests in the community and is currently WIP in mcore. Also, it has been reported the model performance is even better if the same...