vllm [Feature]: Suport for InternVL-Chat-V1-5

🚀 The feature, motivation and pitch

OpenGVLab/InternVL-Chat-V1-2-Plus is a open source alternative to GPT-4V. Can we please have support for that?

Alternatives

GP4-V

Additional context

https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus and https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5

Apr 26 '24 09:04 Iven2132

Yes! Strong model on my tests

Apr 26 '24 11:04 themrzmaster

We are currently in the process of improving support for vision-language models. (See #4194)

Apr 26 '24 11:04 DarkLight1337

We are currently in the process of improving support for vision-language models. (See #4194)

@DarkLight1337 Can I currently deploy https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus on vLLM?

Apr 27 '24 08:04 Iven2132

@DarkLight1337 Can I currently deploy https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus on vLLM?

You'll at least have to write code for implementing the model in vLLM and registering it so it can be automatically initialized from the HuggingFace weights. You can refer to #4228 for an example.

Apr 27 '24 08:04 DarkLight1337

@DarkLight1337 Can I currently deploy https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus on vLLM?

You'll at least have to write code for implementing the model in vLLM and registering it so it can be automatically initialized from the HuggingFace weights. You can refer to #4228 for an example.

I don't even code in Python, Can I just replace tokenizer with llava 1.5 and just loadOpenGVLab/InternVL-Chat-V1-2-Plus

Apr 27 '24 08:04 Iven2132

I don't even code in Python, Can I just replace tokenizer with llava 1.5 and just loadOpenGVLab/InternVL-Chat-V1-2-Plus

That's not possible in general. You'll have to wait until someone implements the model in vLLM. The process of supporting a new vision-language model should become easier once we figure out a more extensible framework for multimodal support.

Apr 27 '24 09:04 DarkLight1337

I don't even code in Python, Can I just replace tokenizer with llava 1.5 and just loadOpenGVLab/InternVL-Chat-V1-2-Plus

That's not possible in general. You'll have to wait until someone implements the model in vLLM. The process of supporting a new vision-language model should become easier once we figure out a more extensible framework for multimodal support.

I think @czczup Can implement this on vLLM.

Apr 27 '24 09:04 Iven2132

I don't even code in Python, Can I just replace tokenizer with llava 1.5 and just loadOpenGVLab/InternVL-Chat-V1-2-Plus

That's not possible in general. You'll have to wait until someone implements the model in vLLM. The process of supporting a new vision-language model should become easier once we figure out a more extensible framework for multimodal support.

@DarkLight1337 I don't if it's a valid question but - Does vLLM support all architectures said in the readme, so that means I just need to replace the model name with another model that on same architecture, and it will work? right?

Apr 29 '24 14:04 Iven2132

@DarkLight1337 I don't if it's a valid question but - Does vLLM support all architectures said in the readme, so that means I just need to replace the model name with another model that on same architecture, and it will work? right?

Yes, that should be the case.

Apr 29 '24 14:04 DarkLight1337

@DarkLight1337 I don't if it's a valid question but - Does vLLM support all architectures said in the readme, so that means I just need to replace the model name with another model that on same architecture, and it will work? right?

Yes, that should be the case.

If that's the case then why I can't InternVL-Chat-V1-5 use it's internlm2 based

May 01 '24 18:05 Iven2132

Being based on an existing model does not mean that the two have identical architectures.

InternVL-1.5's model card says that it is InternViT-6B-448px-V1-5 + MLP + InternLM2-Chat-20B. So, at least the first two components have to be added to the existing model.

May 02 '24 01:05 DarkLight1337

Being based on an existing model does not mean that the two have identical architectures.

InternVL-1.5's model card says that it is InternViT-6B-448px-V1-5 + MLP + InternLM2-Chat-20B. So, at least the first two components have to be added to the existing model.

If I'm not understand you wrong, I need to register InternViT-6B-448px-V1-5 and InternLM2-Chat-20B, then I need to register InternVL-1.5? Is that correct?

May 13 '24 02:05 ruifengma

Being based on an existing model does not mean that the two have identical architectures. InternVL-1.5's model card says that it is InternViT-6B-448px-V1-5 + MLP + InternLM2-Chat-20B. So, at least the first two components have to be added to the existing model.

If I'm not understand you wrong, I need to register InternViT-6B-448px-V1-5 and InternLM2-Chat-20B, then I need to register InternVL-1.5? Is that correct?

I mean that you have to at least implement those two components in vLLM. If InternVL-v1.5 has additional layers, they have to be implemented in vLLM as well.

May 13 '24 02:05 DarkLight1337

Based on the config file, the model architecture should be the same as InternVL2, so I'm closing this as completed by #6514.

Jul 29 '24 11:07 DarkLight1337

vllm vllm copied to clipboard

[Feature]: Suport for InternVL-Chat-V1-5

🚀 The feature, motivation and pitch

Alternatives

Additional context

vllm
vllm copied to clipboard