DeepSpeed
DeepSpeed copied to clipboard
[REQUEST] A minimal example to load universal checkpoint
Up to this point, I did not find minimal examples to resume training from universal checkpoint. The only example for using universal checkpoint is here which is buried under layer of megatron code.
Describe the solution you'd like A minimal example with only deepspeed and pytorch, so that people using other frameworks like pytorch lightning or huggingface and integrate this solution