PiPPy
PiPPy copied to clipboard
Pipeline Parallelism for PyTorch
I'm not very familiar with pipeline parallelism. Can it work if most of the model's parameters are frozen?
Hi, I am trying to run the example script provided for llama model for inference only. Since the repository is going through migration and a lot of changes, I went...
Hi, I'm trying to use PiPPy with a custom model that takes both 'input_ids' and 'labels' as inputs. To check for this functionality, I modified the basic pippy_gpt2.py example by...
It seems that the examples here are all examples of inference, where are the examples of training?
The `split_spec` parameter is not being passed to the `pipeline` function, resulting in the entire model being treated as a single stage. This results in `RuntimeError: Pipeline group size 2...
In examples/checkpoint/toy_model.py `from pippy.compile import compile_stage from pippy.SaveModule import save_checkpoint` which is not exist?
I was experimenting loading `qwen2` model with world-size 2. I am loading the workers completely on CPU. The following is the code I was testing: ``` import os import torch...
Missing `split_spec` in `pipeline` will cause error like: ``` RuntimeError: Pipeline group size 4 cannot be larger than number of stages 1 ```
Hi, I was able to run llama for a single forward, but when I tried to make it generate texts in an autoregressive way, there were errors with input shapes....
Hi, when I try to run the `pippy_llama.py` in this repo, it show that there is a bug: ``` root@6e61f182b97b:/zt/code/my_dev# torchrun --nproc-per-node 4 pippy_llama.py W1027 12:28:26.326000 25180 torch/distributed/run.py:793] W1027 12:28:26.326000...