Juanxi Tian

Results 4 issues of Juanxi Tian

Add the MTMD model to the main branch

documentation
waiting for triage

# What does this PR do? The distributed training of Muon was carefully considered. 1. ​​Distributed Training Support​​: Added gradient synchronization via reduce_scatter_tensor and parameter updates via all_gather_into_tensor for proper...

ScalingOpt is a professional platform focusing on optimization for large-scale deep learning, aiming to advocate for "Optimization at Scale," which means verifiable and scalable optimization algorithms. This community platform is...