DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

create mininal universal checkpoint info for client state

Open xylian86 opened this issue 9 months ago • 1 comments

This PR solves the Issue-5430.

The PR enables the universal checkpoint feature for other platforms like HuggingFace Trainer without requiring changes to the HuggingFace code. It does this by creating a minimal universal checkpoint info, specifically the version, as a default action for the client state.

xylian86 avatar May 13 '24 06:05 xylian86

@xylian86, thanks for this great work. Can you please add convergence curves of an HF model as demo?

tjruwase avatar May 13 '24 09:05 tjruwase

Close this PR as I opened a new one at PR-5608 with the new implementation as @tjruwase suggested.

xylian86 avatar Jun 03 '24 15:06 xylian86