PiPPy icon indicating copy to clipboard operation
PiPPy copied to clipboard

Request for training examples using PipeStage and PipeSchedule

Open dheerj188 opened this issue 11 months ago • 3 comments

Would it be possible to provide a few examples on how to train a network using pipestage and pipeschedule? all the examples i have gone through so far are dedicated for inference.

dheerj188 avatar Mar 19 '24 16:03 dheerj188

We are working on consolidating PipelineStage's runtime code with that from PipelineSchedule. An outcome of that would be training support. We will create a few examples when it is done. If you'd like an early example, here is a unit test for backward for the consolidated code: https://github.com/pytorch/PiPPy/pull/980/files#diff-a99c7fe997a1aef6c41fadf81ff7fb13e27172dd7f97b87dd81ca7d38cf833a9

kwen2501 avatar Mar 19 '24 17:03 kwen2501

Here is a simple training example: https://github.com/pytorch/PiPPy/blob/main/examples/basic/example_train.py

kwen2501 avatar Mar 25 '24 14:03 kwen2501

@kwen2501 Hope you can provide more complicated e2e examples. I noticed pytorch will add HSDP + TP/SP(FSDP2) as a new feature. It's great if you could create a 3D(FSDP + TP/SP + PiPPy) or even 4D(HSDP + TP/SP + PiPPy) parallel HF training demo. It seems that all these features do not need to modify the LLM's source code.

JaheimLee avatar Apr 12 '24 03:04 JaheimLee