vllm icon indicating copy to clipboard operation
vllm copied to clipboard

[Feature]: Suport for InternVL-Chat-V1-5

Open Iven2132 opened this issue 9 months ago • 11 comments

🚀 The feature, motivation and pitch

OpenGVLab/InternVL-Chat-V1-2-Plus is a open source alternative to GPT-4V. Can we please have support for that?

Alternatives

GP4-V

Additional context

https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus and https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5

Iven2132 avatar Apr 26 '24 09:04 Iven2132

Yes! Strong model on my tests

themrzmaster avatar Apr 26 '24 11:04 themrzmaster

We are currently in the process of improving support for vision-language models. (See #4194)

DarkLight1337 avatar Apr 26 '24 11:04 DarkLight1337

We are currently in the process of improving support for vision-language models. (See #4194)

@DarkLight1337 Can I currently deploy https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus on vLLM?

Iven2132 avatar Apr 27 '24 08:04 Iven2132

@DarkLight1337 Can I currently deploy https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus on vLLM?

You'll at least have to write code for implementing the model in vLLM and registering it so it can be automatically initialized from the HuggingFace weights. You can refer to #4228 for an example.

DarkLight1337 avatar Apr 27 '24 08:04 DarkLight1337

@DarkLight1337 Can I currently deploy https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus on vLLM?

You'll at least have to write code for implementing the model in vLLM and registering it so it can be automatically initialized from the HuggingFace weights. You can refer to #4228 for an example.

I don't even code in Python, Can I just replace tokenizer with llava 1.5 and just loadOpenGVLab/InternVL-Chat-V1-2-Plus

Iven2132 avatar Apr 27 '24 08:04 Iven2132

I don't even code in Python, Can I just replace tokenizer with llava 1.5 and just loadOpenGVLab/InternVL-Chat-V1-2-Plus

That's not possible in general. You'll have to wait until someone implements the model in vLLM. The process of supporting a new vision-language model should become easier once we figure out a more extensible framework for multimodal support.

DarkLight1337 avatar Apr 27 '24 09:04 DarkLight1337

I don't even code in Python, Can I just replace tokenizer with llava 1.5 and just loadOpenGVLab/InternVL-Chat-V1-2-Plus

That's not possible in general. You'll have to wait until someone implements the model in vLLM. The process of supporting a new vision-language model should become easier once we figure out a more extensible framework for multimodal support.

I think @czczup Can implement this on vLLM.

Iven2132 avatar Apr 27 '24 09:04 Iven2132

I don't even code in Python, Can I just replace tokenizer with llava 1.5 and just loadOpenGVLab/InternVL-Chat-V1-2-Plus

That's not possible in general. You'll have to wait until someone implements the model in vLLM. The process of supporting a new vision-language model should become easier once we figure out a more extensible framework for multimodal support.

@DarkLight1337 I don't if it's a valid question but - Does vLLM support all architectures said in the readme, so that means I just need to replace the model name with another model that on same architecture, and it will work? right?

Iven2132 avatar Apr 29 '24 14:04 Iven2132

@DarkLight1337 I don't if it's a valid question but - Does vLLM support all architectures said in the readme, so that means I just need to replace the model name with another model that on same architecture, and it will work? right?

Yes, that should be the case.

DarkLight1337 avatar Apr 29 '24 14:04 DarkLight1337

@DarkLight1337 I don't if it's a valid question but - Does vLLM support all architectures said in the readme, so that means I just need to replace the model name with another model that on same architecture, and it will work? right?

Yes, that should be the case.

If that's the case then why I can't InternVL-Chat-V1-5 use it's internlm2 based

Iven2132 avatar May 01 '24 18:05 Iven2132

Being based on an existing model does not mean that the two have identical architectures.

InternVL-1.5's model card says that it is InternViT-6B-448px-V1-5 + MLP + InternLM2-Chat-20B. So, at least the first two components have to be added to the existing model.

DarkLight1337 avatar May 02 '24 01:05 DarkLight1337

Being based on an existing model does not mean that the two have identical architectures.

InternVL-1.5's model card says that it is InternViT-6B-448px-V1-5 + MLP + InternLM2-Chat-20B. So, at least the first two components have to be added to the existing model.

If I'm not understand you wrong, I need to register InternViT-6B-448px-V1-5 and InternLM2-Chat-20B, then I need to register InternVL-1.5? Is that correct?

ruifengma avatar May 13 '24 02:05 ruifengma

Being based on an existing model does not mean that the two have identical architectures. InternVL-1.5's model card says that it is InternViT-6B-448px-V1-5 + MLP + InternLM2-Chat-20B. So, at least the first two components have to be added to the existing model.

If I'm not understand you wrong, I need to register InternViT-6B-448px-V1-5 and InternLM2-Chat-20B, then I need to register InternVL-1.5? Is that correct?

I mean that you have to at least implement those two components in vLLM. If InternVL-v1.5 has additional layers, they have to be implemented in vLLM as well.

DarkLight1337 avatar May 13 '24 02:05 DarkLight1337

Based on the config file, the model architecture should be the same as InternVL2, so I'm closing this as completed by #6514.

DarkLight1337 avatar Jul 29 '24 11:07 DarkLight1337