PiPPy icon indicating copy to clipboard operation
PiPPy copied to clipboard

Pipeline Parallelism for PyTorch

Results 123 PiPPy issues
Sort by recently updated
recently updated
newest added

When trying to run the examples I seem to always run into this error: ` File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1512, in _get_module raise RuntimeError( RuntimeError: Failed to import transformers.generation.utils because of...

Each test/example is almost identical except the model it self, we should unify test boiler plate code

high-pri
2022 H2

Models can be big. Therefore we would need to: - create the model's "skeleton" on meta device - partition it so that it can fit on each device, and -...

cla signed

Hi, I followed the instructions to install pytorch for pipelining: ``` pip install -r requirements.txt --find-links https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html ``` I have version `2.4.0.dev20240605+cpu` installed. When I execute `torchrun --nproc-per-node 4 pippy_gpt2.py`...

Hi, I used the pytorch version 2.5.0.dev20240613+cu124 with python version 3.10.14 When I ran the OPT example "torchrun --nproc-per-node 2 pippy_opt.py", I got this error: ``` [rank0]: Traceback (most recent...

torch version: 2.5.0.dev20240616+cu121 python version: python 3.8 I run the llama example with torchrun --nproc-per-node 2 pippy_llama.py. It got an Error ``` Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 3/3 [00:15

According to no_sync function description in https://github.com/pytorch/pytorch/blob/main/torch/nn/parallel/distributed.py#L1424 ``` .. warning:: The forward pass should be included inside the context manager, or else gradients will still be synchronized. ``` The current...

Enabled some cases to work where `num_microbatches % pp_size != 0`. Using the flex_pp schedule, we will have `num_rounds = max(1, n_microbatches // pp_group_size)` and it works as long as...

cla signed

## Description Please read our [CONTRIBUTING.md](https://github.com/pytorch/PiPPy/blob/main/CONTRIBUTING.md) prior to creating your first pull request. Please include a summary of the feature or issue being fixed. Please also include relevant motivation and...

cla signed

It seems like pipelining could possibly greatly simplify the implementation of a feature such as fairscale's OffloadModel https://fairscale.readthedocs.io/en/latest/deep_dive/offload.html Is this something that is feasible?