InternVL [Feature] Implement InternVL to llama.cpp

Motivation

Many llama.cpp users are requesting this so far. Ollama is one of the interfaces of llama.cpp and it is quite popular. Implementing it will significantly accelerate InterVL adoption and recognition.

Related resources

https://github.com/ggerganov/llama.cpp/issues/6803

Additional context

InternVL is based on LLaMA architecture. Currently text-only InternLM models have been ported to Ollama but not for multimodal ones.

Aug 20 '24 14:08 James4Ever0

Thank you for your suggestions. We will gradually push support for various frameworks, and we also welcome contributions from the community.

Aug 26 '24 05:08 G-z-w

Hey @czczup would you please clarify the reason closing this issue?

Sep 25 '24 06:09 James4Ever0

Any update on internVL support with llama.cpp?

Sep 25 '24 22:09 sammcj

Any update on internVL support with llama.cpp?

Sep 27 '24 02:09 cloudyuyuyu

Thank you for your attention. We are actively progressing on this work, and we also welcome contributions from the community.

Sep 27 '24 03:09 G-z-w

Just curious why this issue was closed if you are actively progressing on this work?

Sep 30 '24 06:09 rampageservices

Thanks for reopening this issue.

On Mon, Sep 30, 2024 at 2:54 PM Zhe Chen @.***> wrote:

Reopened #522 https://github.com/OpenGVLab/InternVL/issues/522.

— Reply to this email directly, view it on GitHub https://github.com/OpenGVLab/InternVL/issues/522#event-14450272200, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE32PCTXAB3YAMM2F3GTQLLZZDYT7AVCNFSM6AAAAABM2CFCKSVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGQ2TAMRXGIZDAMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Oct 02 '24 03:10 rampageservices

any update?

Oct 14 '24 03:10 cloudyuyuyu

November 12, any update? thank in advance

Nov 13 '24 05:11 Cartomex-MX

Is there anything that could be helped with? :) InternVL2.5 is really important for the future. :)

Dec 20 '24 20:12 BleedingDev

pay close attention to this, any update? pls

Jan 06 '25 07:01 linxhome

Is it really that hard to do this? I suggest myself to implement it.

Current progress: https://github.com/ggerganov/llama.cpp/pull/9403

Also ipex-llm has support for InternVL2.

Jan 06 '25 08:01 James4Ever0

Is it really that hard to do this? I suggest myself to implement it.

Thank you for your willingness to help! We greatly appreciate your initiative and would be glad to have your contributions. If you need us to provide any content or information, feel free to let us know.

Jan 06 '25 08:01 G-z-w

@G-z-w Is this model based on the LLaVA architecture? What are the differences, in input, output and internal parameters, and more?

Jan 08 '25 04:01 James4Ever0

@G-z-w Is this model based on the LLaVA architecture? What are the differences, in input, output and internal parameters, and more?

This model, along with other InternVL chat models, is similar to the LLaVA framework, with the specific structure shown in the link. The difference lies in dynamic resolution and pixel shuffle.

If convenient, we recommend prioritizing the deployment of the InternVL 2.5 series. The details of parameters are in model card of blog.

Jan 08 '25 05:01 G-z-w

@G-z-w Is this model based on the LLaVA architecture? What are the differences, in input, output and internal parameters, and more?

This model, along with other InternVL chat models, is similar to the LLaVA framework, with the specific structure shown in the link. The difference lies in dynamic resolution and pixel shuffle.

If convenient, we recommend prioritizing the deployment of the InternVL 2.5 series. The details of parameters are in model card of blog.

Is the model structure of v2.5 identical to v1.5 series? Now I can run v1.5 on llama.cpp. https://github.com/qlylangyu/llama.cpp/pull/1

Jan 14 '25 05:01 James4Ever0

@G-z-w Is this model based on the LLaVA architecture? What are the differences, in input, output and internal parameters, and more?

This model, along with other InternVL chat models, is similar to the LLaVA framework, with the specific structure shown in the link. The difference lies in dynamic resolution and pixel shuffle. If convenient, we recommend prioritizing the deployment of the InternVL 2.5 series. The details of parameters are in model card of blog.

Is the model structure of v2.5 identical to v1.5 series? Now I can run v1.5 on llama.cpp.

Yes, the structure of v2.5 is identical to that of the v1.5 series, except that the v2.5 series use different language models.

Jan 14 '25 06:01 G-z-w

@James4Ever0 any luck in running v2.5 model?

Thank you

Jan 31 '25 19:01 BakingBrains

I tried in llama.cpp today, still not supported yet: python3 convert_hf_to_gguf.py model_path ERROR:hf-to-gguf:Model InternVLChatModel is not supported

Any update for luck?

Feb 14 '25 13:02 gryffindor-rr

any update?

Mar 03 '25 07:03 MrZeros

For anyone who is about to work with the current code, you could check my latest release here.

The 18 KB archive contains function-level diffs using universal ctags and some Python magic so anyone with entry-level C++ knowledge should be able to merge the changes easily.

Mar 03 '25 10:03 James4Ever0

+1

Mar 08 '25 17:03 BrandonJull