Olatunji Ruwase

Results 20 issues of Olatunji Ruwase

Configuration option to enable fused fp16 optimizer.

this PR uses https://github.com/microsoft/DeepSpeed/pull/1801 @ [d911e67](https://github.com/microsoft/DeepSpeed/pull/1801/commits/d911e672248c99c82993a331b79c635e8ea7cfc5) to sync layer norms: 1. for bf16 weights 2. for fp32 weights in bf16 optimizer 3. for 2 optimizer states all_reduce/OP.AVG is used in...

Enable universal checkpoint for zero stages 1 Support fp16 training Fix TP dimension expansion bug @stas00, FYI

This PR is to help reduce the burden of supporting Deep Learning accelerators in DeepSpeed. We expect at least two concrete benefits from: 1. Adding and maintaining accelerator logic will...

This PR is a step towards generalizing the universal checkpointing approach that enables arbitrary reshapes of 3D parallel checkpoints. This PR eliminated the hardcoding of BLOOM model architecture in the...

Enable tensor fragmentation in zero stage 2 & 3 Fix #2290 TODO: test w/ offload (at least cpu to start with) test w/o offload

Reduce unit test I/O size in attempt to fix unit test hangs that we observe on github cpu-only CI system.

https://github.com/microsoft/DeepSpeed/blob/0a73e6e6137a91e1b776f725b637f8b37a75f8e7/deepspeed/runtime/zero/stage3.py#L1168 Only works for float32.

bug
training