PiPPy
PiPPy copied to clipboard
Pipeline Parallelism for PyTorch
I tried the example_train.py with distribuited pytorch and it work well. Just the example split the network layer by layer and assign each one to a single machine. Is possible...
https://github.com/jamesr66a/PiPPy/blob/527af1fd8123d35bd81b9fe304a8d0ed29c9fd8d/pippy/PipelineDriver.py#L565 I wrote `run_until` as a hack, it should probably be a copy-paste of `Interpreter.run` with some termination branch inside (or we should refactor `Interpreter.run` to make implementing like this...
I would like to use PiPPy for distributed inference with multiple machines and multiple GPUs. However, most of the test cases in the repository are for single-machine testing. Can you...
installing from src and PT nightlies, trying to add TP to the HF inference example its failing with `RuntimeError: aten.add.Tensor: got mixed distributed and non-distributed tensors` I am wondering if...
I am trying to reproduce the [gpt2 example](https://github.com/pytorch/PiPPy/tree/main/examples/hf/gpt2) in a single node without slurm for some performance metrics, but the code only provides slurm scripts. How should I modify the...
Hi, I want to know whether I could use pippy's pp capability with deepspeed's zero3 config? So that it together lead to 3d parallism? Thx
Hi~ Thanks for your nice repo! Steps to reproduce the bug: 1. change the original examples/hf/bert/pippy_bert.py to the following: ```python # Copyright (c) Meta Platforms, Inc. and affiliates import argparse...
I running HF_inference.py on my CPU and it works well! It can successfully applying pipeline parallelism on CPU. However, when I applying pipeline parallelism, I found that each rank will...
hi guys, this project is awesome comparing with torch pipe, I have some models that are [not supported by tracing](https://pytorch.org/docs/stable/fx.html#limitations-of-symbolic-tracing). Would you guys have plan to support [torchDynamo](https://github.com/pytorch/torchdynamo) to get...