DeepSpeed-MII Speeding up loading in inference checkpoints

Speeding up loading in inference checkpoints

Open amritap-ef opened this issue 11 months ago • 2 comments

Hi,

I saw this pull request in the DeepSpeed library about snapshotting an engine to be able to load in large models faster but I couldn't see any documentation on this: https://github.com/microsoft/DeepSpeed/pull/4664

How can I save and load in inference checkpoints on my own model faster with DeepSpeed-fastergen?

04/03/24: Updated issue description to be clearer

Mar 01 '24 10:03 amritap-ef

Hi, you can refer these docs and code examples:

https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen
https://github.com/microsoft/DeepSpeedExamples/tree/master/inference/mii

for adding new unsupported models:

https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/inference/v2/model_implementations/AddingAModel.md

for loading local huggingface checkpoints, you can specify the absolute directory path in pipeline

Mar 02 '24 17:03 ZonePG

Thanks for sending this through - apologies I didn't explain this very well.

What I actually was asking about is if there is a a way to reduce loading time for your own finetuned models from HuggingFace checkpoints, as I'm finding that loading in the default models seems to be much faster.

In particular, this PR https://github.com/microsoft/DeepSpeed/pull/4664 references adding the 'capability to snapshot an engine and resume from it' - hence I was wondering how I may save and load that engine so as to reduce the time taken to load a non-persistent pipeline the first time?

Mar 04 '24 12:03 amritap-ef

DeepSpeed-MII DeepSpeed-MII copied to clipboard

Speeding up loading in inference checkpoints

DeepSpeed-MII
DeepSpeed-MII copied to clipboard