Olatunji Ruwase
Olatunji Ruwase
Fix #2071
Configuration option to enable fused fp16 optimizer.
this PR uses https://github.com/microsoft/DeepSpeed/pull/1801 @ [d911e67](https://github.com/microsoft/DeepSpeed/pull/1801/commits/d911e672248c99c82993a331b79c635e8ea7cfc5) to sync layer norms: 1. for bf16 weights 2. for fp32 weights in bf16 optimizer 3. for 2 optimizer states all_reduce/OP.AVG is used in...
Enable universal checkpoint for zero stages 1 Support fp16 training Fix TP dimension expansion bug @stas00, FYI
This PR is to help reduce the burden of supporting Deep Learning accelerators in DeepSpeed. We expect at least two concrete benefits from: 1. Adding and maintaining accelerator logic will...
This PR is a step towards generalizing the universal checkpointing approach that enables arbitrary reshapes of 3D parallel checkpoints. This PR eliminated the hardcoding of BLOOM model architecture in the...
Fix #2789 @stas00
Enable tensor fragmentation in zero stage 2 & 3 Fix #2290 TODO: test w/ offload (at least cpu to start with) test w/o offload
Reduce unit test I/O size in attempt to fix unit test hangs that we observe on github cpu-only CI system.
https://github.com/microsoft/DeepSpeed/blob/0a73e6e6137a91e1b776f725b637f8b37a75f8e7/deepspeed/runtime/zero/stage3.py#L1168 Only works for float32.