PiPPy icon indicating copy to clipboard operation
PiPPy copied to clipboard

Could pippy be coexisted with deepspeed?

Open leiwen83 opened this issue 2 years ago • 1 comments

Hi,

I want to know whether I could use pippy's pp capability with deepspeed's zero3 config? So that it together lead to 3d parallism?

Thx

leiwen83 avatar May 09 '23 09:05 leiwen83

Hi @leiwen83, that's an interesting question.

I think at the Zero-2 stage (where the gradients are sharded), there would need to be some special arrangement: As each micro-batch runs their backward stage, their gradients need to be accumulated, so one would need to delay the reduce_scatter of gradients in Zero-2, and run it only once, after all micro-batches pass through that backward stage.

Cc @rohan-varma to see if you have any additional thoughts.

kwen2501 avatar May 12 '23 21:05 kwen2501