PiPPy
PiPPy copied to clipboard
Pipeline Parallelism for PyTorch
- DDP + PP - FSDP + PP - FSDP + SP + PP
## Description Please read our [CONTRIBUTING.md](https://github.com/pytorch/PiPPy/blob/main/CONTRIBUTING.md) prior to creating your first pull request. Please include a summary of the feature or issue being fixed. Please also include relevant motivation and...
Using the new `example_train.py` and running it with `torchrun --nproc-per-node 3 example_train.py` results in the example hanging when using CPU devices. I have been able to reproduce this on Windows...
Test case: ``` torchrun --nproc-per-node 4 test_fwd.py ``` Reason:  When stage 0 finishes computation and hit batch_send, all corresponding comm’s from other ranks...
How can i import the compile_stage library to initialize my stage? it is not found after installation. ```[tasklist] ### Tasks ```
Llama example works fine when run with 2 GPUs: torchrun --nproc-per-node 2 pippy_llama.py output: ['know', 'think', 'you', 'be', 'getting', 'great', 'favorite', 'right'] However, the example hangs when run 4 or...
How do install torch vision supported by dev version of pytorch required to run Hippy?
 Need to investigate if this is a test issue or pippy issue or general pytorch issue.
After an issue was raised on the accelerate repo I tried adapting https://github.com/pytorch/PiPPy/pull/943 to work with `stable-diffusion-v1-5`. However I found that we can't trace anymore and dynamo finds an error...
File "/home/meisme/PiPPy/examples/inference/hf_generate.py", line 12, in import pippy.fx ModuleNotFoundError: No module named 'pippy.fx' The pippy.fx seems appear nowhere.