PiPPy icon indicating copy to clipboard operation
PiPPy copied to clipboard

How to use PiPPy for large models that won't fit on one GPU

Open aspiridon0v opened this issue 1 year ago • 5 comments

Hello, I was wondering If someone could provide an example or some guidance on how to use PiPPy for models, that will not fit on one GPU. I want to run pipeline parallelism with Llama2 70B on a node with multiple a100 gpus. However, if I run the pippy_llama.py example, every process will just try to load the whole model on the GPU corresponding to its local rank, which will cause a CUDA out of memory error.

aspiridon0v avatar Mar 23 '24 15:03 aspiridon0v

Hi that's indeed an important use case.

In the folder below, we have an CPU initialization example based on GPT2: https://github.com/pytorch/PiPPy/tree/main/examples/cpu_init PiPPy allows you to create the model on CPU, turn it into a pipeline, and move different stages onto corresponding GPUs.

We also have semi-support for meta init. Today, one can create a pipeline from a meta model:

with torch.device(meta):
    model = Model(...)

pipe = pipeline(model, ...)

However, we are still working on loading weights into different pipeline stages on different processes (so as to turn the meta stages into materialized stages). We will update here when that's complete.

Hopefully, the CPU init example can unblock your use case for now. Though it may require some amount of CPU RAM on your machine.

Cc: @LucasLLC @wconstab

kwen2501 avatar Mar 25 '24 01:03 kwen2501

If you can share with us which type of checkpoint format you want support of, that would help prioritize things too.

kwen2501 avatar Mar 25 '24 01:03 kwen2501

If you can share with us which type of checkpoint format you want support of, that would help prioritize things too.

Thank you for getting back to me, the CPU Init example was indeed very helpful. I am using the Hugging Face Transformers library to load the Llama2 70B model using the .from_pretrained() method. The checkpoint format is:

hash.hash.lock (lock file) hash.hash.json (configuration file) hash.hash (binary file containing the model's parameters)

aspiridon0v avatar Mar 26 '24 11:03 aspiridon0v

Cc @LucasLLC @wconstab

kwen2501 avatar Mar 27 '24 19:03 kwen2501

We'll be integrating PipPy into TorchTrain soon, and along with that we'll get meta-initialization or cpu-initialization working nicely as an example for folks to see. Are you unblocked for now?

wconstab avatar Mar 30 '24 00:03 wconstab