Haichen Huang
Haichen Huang
### 🐛 Describe the bug When model has both fp16 gradient and fp32 gradient, hybrid adam may unable to update parameters correctly. Since we put all parameters to a list...
- refactor moe routers - fix moe bugs with activation checkpoint
# _A New ZeRO Implementation_ ## Backgrounds In the current version, our ZeRO has a performance issue. The reason is that our asymmetric distribution of chunks makes one process hinder...
# What's New ZeRO1 and ZeRO2 optimizer is added. Here are something to do next. * correct `clip_grad_norm` with model and pipeline parallelism * test training efficiency
### What's New Fix `NotImplementedError: Some torch function is incompatible because of its complcated inputs.` when training diffusers. * add a ignore step for no grad tensors * change the...