DeepSpeed
DeepSpeed copied to clipboard
create mininal universal checkpoint info for client state
This PR solves the Issue-5430.
The PR enables the universal checkpoint feature for other platforms like HuggingFace Trainer without requiring changes to the HuggingFace code. It does this by creating a minimal universal checkpoint info, specifically the version, as a default action for the client state.
@xylian86, thanks for this great work. Can you please add convergence curves of an HF model as demo?
Close this PR as I opened a new one at PR-5608 with the new implementation as @tjruwase suggested.