DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[REQUEST]Inference Optimized Pipeline Parallelism

Open champson opened this issue 2 years ago • 3 comments

As mentioned in the paper https://arxiv.org/abs/2207.00032, DeepSpeed inference supports pipeline parallelism, including hybrid scheduling, offloading activations, and communication, which have led to significant performance improvements. However, does DeepSpeed currently support these features? If not, is there a timeline for when they will be supported?

champson avatar Jul 12 '23 09:07 champson

I'm also very interested in this issue. It would be great to get a clear response from the community on this matter. Thanks!

yefanhust avatar Jul 14 '23 07:07 yefanhust

@champson, @yefanhust are there specific models/scenarios you are looking to apply pipeline parallelism for. The scenarios that PP is helpful for inference is very narrow, and applicable in just a handful of cases currently, so we have de-prioritized releasing these features. But we can re-visit this if there is a strong interest in the community for these features.

samyam avatar Jul 27 '23 17:07 samyam

Hi!I am also interesting in this feature described in this paper, is there any demo or tutorial for 'hybrid pipeline inference schedule'?

Noblezhong avatar Oct 11 '24 03:10 Noblezhong