PiPPy issues

examples/huggingface failed

8

When trying to run the examples I seem to always run into this error: ` File "/home/ubuntu/.local/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1512, in _get_module raise RuntimeError( RuntimeError: Failed to import transformers.generation.utils because of...

yaxan

Deduplicate boiler plate test code

Each test/example is almost identical except the model it self, we should unify test boiler plate code

pbelevich

high-pri

2022 H2

Meta init llama then pipeline then materialize

1

Models can be big. Therefore we would need to: - create the model's "skeleton" on meta device - partition it so that it can fit on each device, and -...

kwen2501

cla signed

`pipeline` arguments are not matched

8

Hi, I followed the instructions to install pytorch for pipelining: ``` pip install -r requirements.txt --find-links https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html ``` I have version `2.4.0.dev20240605+cpu` installed. When I execute `torchrun --nproc-per-node 4 pippy_gpt2.py`...

rednoah91

[Error] pipeline() got an unexpected keyword argument

1

Hi, I used the pytorch version 2.5.0.dev20240613+cu124 with python version 3.10.14 When I ran the OPT example "torchrun --nproc-per-node 2 pippy_opt.py", I got this error: ``` [rank0]: Traceback (most recent...

HieronZhang

[BUG] cannot capture your model as a full graph

6

torch version: 2.5.0.dev20240616+cu121 python version: python 3.8 I run the llama example with torchrun --nproc-per-node 2 pippy_llama.py. It got an Error ``` Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 3/3 [00:15

sunkun1997

[Bug?] Gradient Synchronization for DDP

3

According to no_sync function description in https://github.com/pytorch/pytorch/blob/main/torch/nn/parallel/distributed.py#L1424 ``` .. warning:: The forward pass should be included inside the context manager, or else gradients will still be synchronized. ``` The current...

jianweif

Implemented flexible PP

1

Enabled some cases to work where `num_microbatches % pp_size != 0`. Using the flex_pp schedule, we will have `num_rounds = max(1, n_microbatches // pp_group_size)` and it works as long as...

haocizhang

cla signed

[WIP] enable doraPP

1

## Description Please read our [CONTRIBUTING.md](https://github.com/pytorch/PiPPy/blob/main/CONTRIBUTING.md) prior to creating your first pull request. Please include a summary of the feature or issue being fixed. Please also include relevant motivation and...

tianfengfrank

cla signed

CPU offloading?

2

It seems like pipelining could possibly greatly simplify the implementation of a feature such as fairscale's OffloadModel https://fairscale.readthedocs.io/en/latest/deep_dive/offload.html Is this something that is feasible?

Xynonners

PiPPy
PiPPy copied to clipboard

Metadata

examples/huggingface failed

Deduplicate boiler plate test code

Meta init llama then pipeline then materialize

`pipeline` arguments are not matched

[Error] pipeline() got an unexpected keyword argument

[BUG] cannot capture your model as a full graph

[Bug?] Gradient Synchronization for DDP

Implemented flexible PP

[WIP] enable doraPP

CPU offloading?

← Metadata

Owner

Metadata

PiPPy PiPPy copied to clipboard

Metadata

← Metadata

Owner

Metadata

PiPPy
PiPPy copied to clipboard