PiPPy
PiPPy copied to clipboard
Request for training examples using PipeStage and PipeSchedule
Would it be possible to provide a few examples on how to train a network using pipestage and pipeschedule? all the examples i have gone through so far are dedicated for inference.
We are working on consolidating PipelineStage's runtime code with that from PipelineSchedule. An outcome of that would be training support. We will create a few examples when it is done. If you'd like an early example, here is a unit test for backward for the consolidated code: https://github.com/pytorch/PiPPy/pull/980/files#diff-a99c7fe997a1aef6c41fadf81ff7fb13e27172dd7f97b87dd81ca7d38cf833a9
Here is a simple training example: https://github.com/pytorch/PiPPy/blob/main/examples/basic/example_train.py
@kwen2501 Hope you can provide more complicated e2e examples. I noticed pytorch will add HSDP + TP/SP(FSDP2) as a new feature. It's great if you could create a 3D(FSDP + TP/SP + PiPPy) or even 4D(HSDP + TP/SP + PiPPy) parallel HF training demo. It seems that all these features do not need to modify the LLM's source code.