XiDianZuoYun

Results 2 issues of XiDianZuoYun

### Describe the bug With ZeRO-2 + CPU Offload + overlap_comm=true, the IPG (Independent Partition Gradient) buckets are never populated. During gradient reduction (reduce_ipg_grads), we consistently observe empty buckets (bucket.index=0...

bug
compression

Describe the bug When training with DeepSpeed ZeRO Stage 2 and optimizer offload to CPU, calling engine.backward(loss_) results in empty IPG buckets during gradient reduction (e.g., bucket.buffer: []). This leads...

bug
training