DeepSpeed [REQUEST]Inference Optimized Pipeline Parallelism

As mentioned in the paper https://arxiv.org/abs/2207.00032, DeepSpeed inference supports pipeline parallelism, including hybrid scheduling, offloading activations, and communication, which have led to significant performance improvements. However, does DeepSpeed currently support these features? If not, is there a timeline for when they will be supported?

Jul 12 '23 09:07 champson

I'm also very interested in this issue. It would be great to get a clear response from the community on this matter. Thanks!

Jul 14 '23 07:07 yefanhust

@champson, @yefanhust are there specific models/scenarios you are looking to apply pipeline parallelism for. The scenarios that PP is helpful for inference is very narrow, and applicable in just a handful of cases currently, so we have de-prioritized releasing these features. But we can re-visit this if there is a strong interest in the community for these features.

Jul 27 '23 17:07 samyam

Hi！I am also interesting in this feature described in this paper, is there any demo or tutorial for 'hybrid pipeline inference schedule'?

Oct 11 '24 03:10 Noblezhong