Boxiang Wang

Results 30 comments of Boxiang Wang

Thanks Xuwen for this great documentation, can you also move it or at least link it [here ](https://docs.nvidia.com/megatron-core/developer-guide/latest/api-guide/custom_fsdp.html) ([code](https://github.com/NVIDIA/Megatron-LM/blob/main/docs/source/api-guide/custom_fsdp.md))? Not sure where is this doc/discussion being published.

@JavaZeroo simply set `--optimizer dist_muon`(Layerwise distributed optimizer) or `--optimizer muon` will turn on Muon usage

Hi @dimapihtar, can you help take a look? Thanks