InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

[Feature] Implement InternVL to llama.cpp

Open James4Ever0 opened this issue 1 year ago • 22 comments

Motivation

Many llama.cpp users are requesting this so far. Ollama is one of the interfaces of llama.cpp and it is quite popular. Implementing it will significantly accelerate InterVL adoption and recognition.

Related resources

https://github.com/ggerganov/llama.cpp/issues/6803

Additional context

InternVL is based on LLaMA architecture. Currently text-only InternLM models have been ported to Ollama but not for multimodal ones.

James4Ever0 avatar Aug 20 '24 14:08 James4Ever0

Thank you for your suggestions. We will gradually push support for various frameworks, and we also welcome contributions from the community.

G-z-w avatar Aug 26 '24 05:08 G-z-w

Hey @czczup would you please clarify the reason closing this issue?

James4Ever0 avatar Sep 25 '24 06:09 James4Ever0

Any update on internVL support with llama.cpp?

sammcj avatar Sep 25 '24 22:09 sammcj

Any update on internVL support with llama.cpp?

cloudyuyuyu avatar Sep 27 '24 02:09 cloudyuyuyu

Thank you for your attention. We are actively progressing on this work, and we also welcome contributions from the community.

G-z-w avatar Sep 27 '24 03:09 G-z-w

Just curious why this issue was closed if you are actively progressing on this work?

rampageservices avatar Sep 30 '24 06:09 rampageservices

Thanks for reopening this issue.

On Mon, Sep 30, 2024 at 2:54 PM Zhe Chen @.***> wrote:

Reopened #522 https://github.com/OpenGVLab/InternVL/issues/522.

— Reply to this email directly, view it on GitHub https://github.com/OpenGVLab/InternVL/issues/522#event-14450272200, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE32PCTXAB3YAMM2F3GTQLLZZDYT7AVCNFSM6AAAAABM2CFCKSVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGQ2TAMRXGIZDAMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

rampageservices avatar Oct 02 '24 03:10 rampageservices

any update?

cloudyuyuyu avatar Oct 14 '24 03:10 cloudyuyuyu

November 12, any update? thank in advance

Cartomex-MX avatar Nov 13 '24 05:11 Cartomex-MX

Is there anything that could be helped with? :) InternVL2.5 is really important for the future. :)

BleedingDev avatar Dec 20 '24 20:12 BleedingDev

pay close attention to this, any update? pls

linxhome avatar Jan 06 '25 07:01 linxhome

Is it really that hard to do this? I suggest myself to implement it.

Current progress: https://github.com/ggerganov/llama.cpp/pull/9403

Also ipex-llm has support for InternVL2.

James4Ever0 avatar Jan 06 '25 08:01 James4Ever0

Is it really that hard to do this? I suggest myself to implement it.

Thank you for your willingness to help! We greatly appreciate your initiative and would be glad to have your contributions. If you need us to provide any content or information, feel free to let us know.

G-z-w avatar Jan 06 '25 08:01 G-z-w

@G-z-w Is this model based on the LLaVA architecture? What are the differences, in input, output and internal parameters, and more?

James4Ever0 avatar Jan 08 '25 04:01 James4Ever0

@G-z-w Is this model based on the LLaVA architecture? What are the differences, in input, output and internal parameters, and more?

This model, along with other InternVL chat models, is similar to the LLaVA framework, with the specific structure shown in the link. The difference lies in dynamic resolution and pixel shuffle.

If convenient, we recommend prioritizing the deployment of the InternVL 2.5 series. The details of parameters are in model card of blog.

G-z-w avatar Jan 08 '25 05:01 G-z-w

@G-z-w Is this model based on the LLaVA architecture? What are the differences, in input, output and internal parameters, and more?

This model, along with other InternVL chat models, is similar to the LLaVA framework, with the specific structure shown in the link. The difference lies in dynamic resolution and pixel shuffle.

If convenient, we recommend prioritizing the deployment of the InternVL 2.5 series. The details of parameters are in model card of blog.

Is the model structure of v2.5 identical to v1.5 series? Now I can run v1.5 on llama.cpp. https://github.com/qlylangyu/llama.cpp/pull/1

James4Ever0 avatar Jan 14 '25 05:01 James4Ever0

@G-z-w Is this model based on the LLaVA architecture? What are the differences, in input, output and internal parameters, and more?

This model, along with other InternVL chat models, is similar to the LLaVA framework, with the specific structure shown in the link. The difference lies in dynamic resolution and pixel shuffle. If convenient, we recommend prioritizing the deployment of the InternVL 2.5 series. The details of parameters are in model card of blog.

Is the model structure of v2.5 identical to v1.5 series? Now I can run v1.5 on llama.cpp.

Yes, the structure of v2.5 is identical to that of the v1.5 series, except that the v2.5 series use different language models.

G-z-w avatar Jan 14 '25 06:01 G-z-w

@James4Ever0 any luck in running v2.5 model?

Thank you

BakingBrains avatar Jan 31 '25 19:01 BakingBrains

I tried in llama.cpp today, still not supported yet: python3 convert_hf_to_gguf.py model_path ERROR:hf-to-gguf:Model InternVLChatModel is not supported

Any update for luck?

gryffindor-rr avatar Feb 14 '25 13:02 gryffindor-rr

any update?

MrZeros avatar Mar 03 '25 07:03 MrZeros

For anyone who is about to work with the current code, you could check my latest release here.

The 18 KB archive contains function-level diffs using universal ctags and some Python magic so anyone with entry-level C++ knowledge should be able to merge the changes easily.

James4Ever0 avatar Mar 03 '25 10:03 James4Ever0

+1

BrandonJull avatar Mar 08 '25 17:03 BrandonJull