PiPPy
PiPPy copied to clipboard
Pipeline Parallelism for PyTorch
I'm experimenting with pipelined training example of ResNet (`pippy_resnet.py`) from [https://github.com/pytorch/PiPPy/tree/main/examples/resnet](https://github.com/pytorch/PiPPy/tree/main/examples/resnet). Namely, I want to compare the loss when running locally on one GPU and when running using `pippy`. I...
After certain torch 2.2.0.dev version, submod_0, submod_1, submod_2 ... are named as submod_0, submod_2, submod_4 ... Hence this assert would fail: ``` pippy/IR.py", line 682, in _number_and_count_forward_stages assert all(i in...
When I run “torchrun --rdzv-backend=c10d --rdzv-endpoint=localhost:29500 --nnodes=1 --nproc-per-node=4 test_pipeline_schedule.py --schedules gpipe”,I got the following outputs: ```shell [2023-12-03 08:40:53,722] torch.distributed.run: [WARNING] [2023-12-03 08:40:53,722] torch.distributed.run: [WARNING] ***************************************** [2023-12-03 08:40:53,722] torch.distributed.run: [WARNING] Setting...
Graph interpretation refers to: - Figuring out stage module to rank mapping - Figuring out stage-to-stage communication relationship (connection, tensor transmission size, etc) Pipeline executor refers to: - Running micro-chunked...
Using latest nightly (1109) and running on H100 server: running tests/local_test_c10d.py results in the final tensor comparison failing with 16% mismatch (appears to be rounding, largest diff is .0097). ~~~...
I'm trying to do fine-tuning of language-modeling, freezing some first layers of RoBERTa. The code is pretty similar to `run_mlm.py` example from [https://github.com/pytorch/PiPPy/tree/main/examples/hf/language-modeling](https://github.com/pytorch/PiPPy/tree/main/examples/hf/language-modeling). But I get error in step of...
I'm trying to run a model based on **RoBERTa** with analogy on `run_mlm.py` example from [https://github.com/pytorch/PiPPy/tree/main/examples/hf/language-modeling](https://github.com/pytorch/PiPPy/tree/main/examples/hf/language-modeling). But when using function `split_into_equal_size`, I get a submodule without any layers. This can...
I played with the [`hf_generate` ](https://github.com/pytorch/tau/pull/772) branch and it seems quite ready to be expanded to support BLOOM-3B/7B1 models etc (https://github.com/zsc/tau/pull/1 ). Great work! Is there an imminent plan to...
In my understanding, pipeline parallelism is decentralized, but why is a master needed in the example. args.world_size = 5 # "This program requires exactly 4 workers + 1 master"
root@6496cf66be1e:/workspace/PiPPy/examples/resnet# python pippy_resnet.py -s=1F1B [PiPPy] World size: 5, DP group size: 1, PP group size: 5 rank = 4 host/pid/device = 6496cf66be1e/2823/cuda:4 [W socket.cpp:601] [c10d] The client socket has failed...