vllm [Feature]: Pipeline Parallelism support for LLaMA3.2 90B Vision Model

Your current environment

The output of `python collect_env.py`

Your output of `python collect_env.py` here

Model Input Dumps

No response

🐛 Describe the bug

I'm trying to load LLaMA 3.2 90B Vision model across two nodes. Each node has 2 A100 80GB GPUs. I'm using tensor parallel size=1 and pipeline parallel size = 4. I get the following not implemented error.

I'm using the latest published version of vLLM (version: 0.6.2). Any help to resolve this would be greatly received. Thank you.

raise NotImplementedError(

NotImplementedError: Pipeline parallelism is only supported for the following architectures: ['AquilaForCausalLM', 'AquilaModel', 'DeepseekV2ForCausalLM', 'GPT2LMHeadModel', 'InternLM2ForCausalLM', 'InternLMForCausalLM', 'InternVLChatModel', 'JAISLMHeadModel', 'LlamaForCausalLM', 'LLaMAForCausalLM', 'MistralForCausalLM', 'MixtralForCausalLM', 'NemotronForCausalLM', 'Phi3ForCausalLM', 'Qwen2ForCausalLM', 'Qwen2MoeForCausalLM', 'QWenLMHeadModel', 'Qwen2VLForConditionalGeneration'].

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Oct 02 '24 01:10 sekh77

Command that I'm using to load the model:

vllm serve meta-llama/Llama-3.2-90B-Vision-Instruct --enforce-eager --max-num-seqs 16 --tensor-parallel-size 4

Oct 02 '24 01:10 sekh77

Yeah, PP is not supported for encoder-decoder models yet. See https://github.com/vllm-project/vllm/pull/7168#issuecomment-2391498161

Oct 02 '24 04:10 DarkLight1337

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

Jan 02 '25 01:01 github-actions[bot]

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you!

Feb 01 '25 02:02 github-actions[bot]