Haichen Huang

Results 11 issues of Haichen Huang

### 🐛 Describe the bug When model has both fp16 gradient and fp32 gradient, hybrid adam may unable to update parameters correctly. Since we put all parameters to a list...

bug

- refactor moe routers - fix moe bugs with activation checkpoint

# _A New ZeRO Implementation_ ## Backgrounds In the current version, our ZeRO has a performance issue. The reason is that our asymmetric distribution of chunks makes one process hinder...

# What's New ZeRO1 and ZeRO2 optimizer is added. Here are something to do next. * correct `clip_grad_norm` with model and pipeline parallelism * test training efficiency

Run Build and Test

Run Build and Test

### What's New Fix `NotImplementedError: Some torch function is incompatible because of its complcated inputs.` when training diffusers. * add a ignore step for no grad tensors * change the...