Olatunji Ruwase issues

Results 20 issues of


                                            Olatunji Ruwase

Match compute and reduce dtype

Fix #2071

Configure fused fp16 mode

Configuration option to enable fused fp16 optimizer.

Sync 4 layer norms - bf16, fp32, optimizer states on restart

this PR uses https://github.com/microsoft/DeepSpeed/pull/1801 @ [d911e67](https://github.com/microsoft/DeepSpeed/pull/1801/commits/d911e672248c99c82993a331b79c635e8ea7cfc5) to sync layer norms: 1. for bf16 weights 2. for fp32 weights in bf16 optimizer 3. for 2 optimizer states all_reduce/OP.AVG is used in...

Universal checkpoint for zero stage 1

Enable universal checkpoint for zero stages 1 Support fp16 training Fix TP dimension expansion bug @stas00, FYI

Accelerator abstraction

This PR is to help reduce the burden of supporting Deep Learning accelerators in DeepSpeed. We expect at least two concrete benefits from: 1. Adding and maintaining accelerator logic will...

Encoding checkpoint reshaping guide

This PR is a step towards generalizing the universal checkpointing approach that enables arbitrary reshapes of 3D parallel checkpoints. This PR eliminated the hardcoding of BLOOM model architecture in the...

Make z3 respect comm dtype

Fix #2789 @stas00

Enable tensor fragments for zero 2 & 3

Enable tensor fragmentation in zero stage 2 & 3 Fix #2290 TODO: test w/ offload (at least cpu to start with) test w/o offload

Reduce I/O size

Reduce unit test I/O size in attempt to fix unit test hangs that we observe on github cpu-only CI system.

[BUG] Communication dtype is broken in zero 3

https://github.com/microsoft/DeepSpeed/blob/0a73e6e6137a91e1b776f725b637f8b37a75f8e7/deepspeed/runtime/zero/stage3.py#L1168 Only works for float32.

bug

training