Xinyu Lian

Results 7 comments of Xinyu Lian

@microsoft-github-policy-service agree On Apr 29, 2024, at 12:59 AM, microsoft-github-policy-service[bot] ***@***.***> wrote: @xylian86 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the...

@Orion-Zheng Could you provide the scripts you used for training? I would be happy to help solve the issue.

Convergence curve for ZeRO 3 using the current implementation

@ArtificialZeng We have released examples in the Megatron-DeepSpeed repository. You can find them at: https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples_deepspeed/universal_checkpointing. Please let us know if you encounter any issues.

@Orion-Zheng This PR should fix the issue you mentioned (universal checkpoint does not support HF trainer). Feel free to ping me if you have any questions or suggestions on this...

@tjruwase Yes, for sure. @xiyang-aads-lilly Could you please try 1. Install the latest DeepSpeed version (v0.14.5 Patch release) 2. Add the argument `inject_missing_state` when you run the conversion?

Close this PR as I opened a new one at [PR-5608](https://github.com/microsoft/DeepSpeed/pull/5608) with the new implementation as @tjruwase suggested.