Juanxi Tian
Juanxi Tian
Add the MTMD model to the main branch
# What does this PR do? The distributed training of Muon was carefully considered. 1. Distributed Training Support: Added gradient synchronization via reduce_scatter_tensor and parameter updates via all_gather_into_tensor for proper...
ScalingOpt is a professional platform focusing on optimization for large-scale deep learning, aiming to advocate for "Optimization at Scale," which means verifiable and scalable optimization algorithms. This community platform is...