DeepSpeedExamples icon indicating copy to clipboard operation
DeepSpeedExamples copied to clipboard

Does it support lora and pipeline parallel now?

Open blldd opened this issue 2 years ago • 3 comments

I found that the memory usage is very large even when using zero3 and lora, so I was wondering whether can I support pipeline parallelism or Tensor parallelism?

blldd avatar Apr 13 '23 09:04 blldd

LoRA is supported. Pipeline parallelism and Tensor parallelism is not supported yet.

yaozhewei avatar Apr 13 '23 15:04 yaozhewei

@yaozhewei hi~,请教一下。“finetune with Pipeline parallelism and Tensor parallelism” --后续官方有支持计划吗?我理解如果模型参数非常大情况下(e.g :bloom 176b),需要跨节点分布,此时需要pp支持吧,还是说zero策略目前原理上可支持参数和激活的跨节点?

taishiciR avatar Apr 17 '23 07:04 taishiciR

We tested up to OPT-175B and the current framework works well. We will discuss internally about the support of PP and TP :)

yaozhewei avatar Apr 18 '23 17:04 yaozhewei

Close the issue since there is no followup. Please reopen it if necessary

yaozhewei avatar Apr 24 '23 19:04 yaozhewei

Hi, do you have any plans on supporting Pipeline parallelism now?

LSX-Sneakerprogrammer avatar Jun 15 '23 07:06 LSX-Sneakerprogrammer

We tested up to OPT-175B and the current framework works well. We will discuss internally about the support of PP and TP :)

Hi, can you give more infos about your machine's number and training efficiency? Thanks!

I am trying load 7B actor model and 7B rw model in Zero-3, if I use more than One Machine, it will very slow.(latency 120s -> lantency 1700s)

FlyCarrot avatar Aug 28 '23 08:08 FlyCarrot