XiDianZuoYun
Results
2
issues of
XiDianZuoYun
### Describe the bug With ZeRO-2 + CPU Offload + overlap_comm=true, the IPG (Independent Partition Gradient) buckets are never populated. During gradient reduction (reduce_ipg_grads), we consistently observe empty buckets (bucket.index=0...
bug
compression
Describe the bug When training with DeepSpeed ZeRO Stage 2 and optimizer offload to CPU, calling engine.backward(loss_) results in empty IPG buckets during gradient reduction (e.g., bucket.buffer: []). This leads...
bug
training