kserve icon indicating copy to clipboard operation
kserve copied to clipboard

Are there plans to support DeepSpeed-Inference?

Open jinchihe opened this issue 2 years ago • 3 comments

/kind feature

Describe the solution you'd like DeepSpeed-Inference introduces several features to efficiently serve transformer-based PyTorch models. It supports model parallelism (MP) to fit large models that would otherwise not fit in GPU memory. Even for smaller models, MP can be used to reduce latency for inference.

Personally I think that's great to support DeepSpeed-Inference or build example/flows in in kserve, comments?

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

jinchihe avatar Jul 10 '23 09:07 jinchihe

@jinchihe you can run deepspeed inference with torchserve/kserve https://pytorch.org/serve/large_model_inference.html#deepspeed, @cmaddalozzo is also working on the native support from kserve side.

yuzisun avatar Jul 11 '23 13:07 yuzisun

Wow! Great news! Thanks @yuzisun @cmaddalozzo

jinchihe avatar Jul 16 '23 00:07 jinchihe

@yuzisun Please ask if there is a complete example of publishing the DeepSpeed model as a reasoning service using kserve

FlyAIBox avatar Dec 26 '23 09:12 FlyAIBox