Xinyu Lian
Results
2
issues of
Xinyu Lian
This PR enables the universal checkpoint for zero stage 3. Notes: - The current implementation supports Data parallelism. - Development is ongoing for universal checkpoint Stage 3 with tensor-slicing model...
This PR solves the [Issue-5430](https://github.com/microsoft/DeepSpeed/issues/5430). The PR enables the universal checkpoint feature for other platforms like HuggingFace Trainer without requiring changes to the HuggingFace code. It does this by creating...