PiPPy
PiPPy copied to clipboard
Pipeline Parallelism for PyTorch
When trying to run the examples I seem to always run into this error: ` File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1512, in _get_module raise RuntimeError( RuntimeError: Failed to import transformers.generation.utils because of...
Each test/example is almost identical except the model it self, we should unify test boiler plate code
Models can be big. Therefore we would need to: - create the model's "skeleton" on meta device - partition it so that it can fit on each device, and -...
Hi, I followed the instructions to install pytorch for pipelining: ``` pip install -r requirements.txt --find-links https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html ``` I have version `2.4.0.dev20240605+cpu` installed. When I execute `torchrun --nproc-per-node 4 pippy_gpt2.py`...
Hi, I used the pytorch version 2.5.0.dev20240613+cu124 with python version 3.10.14 When I ran the OPT example "torchrun --nproc-per-node 2 pippy_opt.py", I got this error: ``` [rank0]: Traceback (most recent...
torch version: 2.5.0.dev20240616+cu121 python version: python 3.8 I run the llama example with torchrun --nproc-per-node 2 pippy_llama.py. It got an Error ``` Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 3/3 [00:15
According to no_sync function description in https://github.com/pytorch/pytorch/blob/main/torch/nn/parallel/distributed.py#L1424 ``` .. warning:: The forward pass should be included inside the context manager, or else gradients will still be synchronized. ``` The current...
Enabled some cases to work where `num_microbatches % pp_size != 0`. Using the flex_pp schedule, we will have `num_rounds = max(1, n_microbatches // pp_group_size)` and it works as long as...
## Description Please read our [CONTRIBUTING.md](https://github.com/pytorch/PiPPy/blob/main/CONTRIBUTING.md) prior to creating your first pull request. Please include a summary of the feature or issue being fixed. Please also include relevant motivation and...
It seems like pipelining could possibly greatly simplify the implementation of a feature such as fairscale's OffloadModel https://fairscale.readthedocs.io/en/latest/deep_dive/offload.html Is this something that is feasible?