vllm
vllm copied to clipboard
[Feature]: Suport for InternVL-Chat-V1-5
🚀 The feature, motivation and pitch
OpenGVLab/InternVL-Chat-V1-2-Plus is a open source alternative to GPT-4V. Can we please have support for that?
Alternatives
GP4-V
Additional context
https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus and https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5
Yes! Strong model on my tests
We are currently in the process of improving support for vision-language models. (See #4194)
We are currently in the process of improving support for vision-language models. (See #4194)
@DarkLight1337 Can I currently deploy https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus on vLLM?
@DarkLight1337 Can I currently deploy https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus on vLLM?
You'll at least have to write code for implementing the model in vLLM and registering it so it can be automatically initialized from the HuggingFace weights. You can refer to #4228 for an example.
@DarkLight1337 Can I currently deploy https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus on vLLM?
You'll at least have to write code for implementing the model in vLLM and registering it so it can be automatically initialized from the HuggingFace weights. You can refer to #4228 for an example.
I don't even code in Python, Can I just replace tokenizer with llava 1.5 and just loadOpenGVLab/InternVL-Chat-V1-2-Plus
I don't even code in Python, Can I just replace tokenizer with llava 1.5 and just loadOpenGVLab/InternVL-Chat-V1-2-Plus
That's not possible in general. You'll have to wait until someone implements the model in vLLM. The process of supporting a new vision-language model should become easier once we figure out a more extensible framework for multimodal support.
I don't even code in Python, Can I just replace tokenizer with llava 1.5 and just loadOpenGVLab/InternVL-Chat-V1-2-Plus
That's not possible in general. You'll have to wait until someone implements the model in vLLM. The process of supporting a new vision-language model should become easier once we figure out a more extensible framework for multimodal support.
I think @czczup Can implement this on vLLM.
I don't even code in Python, Can I just replace tokenizer with llava 1.5 and just loadOpenGVLab/InternVL-Chat-V1-2-Plus
That's not possible in general. You'll have to wait until someone implements the model in vLLM. The process of supporting a new vision-language model should become easier once we figure out a more extensible framework for multimodal support.
@DarkLight1337 I don't if it's a valid question but - Does vLLM support all architectures said in the readme, so that means I just need to replace the model name with another model that on same architecture, and it will work? right?
@DarkLight1337 I don't if it's a valid question but - Does vLLM support all architectures said in the readme, so that means I just need to replace the model name with another model that on same architecture, and it will work? right?
Yes, that should be the case.
@DarkLight1337 I don't if it's a valid question but - Does vLLM support all architectures said in the readme, so that means I just need to replace the model name with another model that on same architecture, and it will work? right?
Yes, that should be the case.
If that's the case then why I can't InternVL-Chat-V1-5 use it's internlm2 based
Being based on an existing model does not mean that the two have identical architectures.
InternVL-1.5's model card says that it is InternViT-6B-448px-V1-5 + MLP + InternLM2-Chat-20B. So, at least the first two components have to be added to the existing model.
Being based on an existing model does not mean that the two have identical architectures.
InternVL-1.5's model card says that it is InternViT-6B-448px-V1-5 + MLP + InternLM2-Chat-20B. So, at least the first two components have to be added to the existing model.
If I'm not understand you wrong, I need to register InternViT-6B-448px-V1-5 and InternLM2-Chat-20B, then I need to register InternVL-1.5? Is that correct?
Being based on an existing model does not mean that the two have identical architectures. InternVL-1.5's model card says that it is InternViT-6B-448px-V1-5 + MLP + InternLM2-Chat-20B. So, at least the first two components have to be added to the existing model.
If I'm not understand you wrong, I need to register InternViT-6B-448px-V1-5 and InternLM2-Chat-20B, then I need to register InternVL-1.5? Is that correct?
I mean that you have to at least implement those two components in vLLM. If InternVL-v1.5 has additional layers, they have to be implemented in vLLM as well.
Based on the config file, the model architecture should be the same as InternVL2, so I'm closing this as completed by #6514.