InternVL 请问有推荐的部署框架吗?

我查阅了几个LLM部署推理框架，目前似乎都不支持InternVL，请问项目组有推荐使用的框架吗？

Apr 23 '24 05:04 Halflifefa

@Halflifefa 打个广告：

LMDeploy 目前支持 InternVL之前发布的6个VL模型。1.5会在这周支持。

这是相关文档： https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/inference/vl_pipeline.md https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/serving/api_server_vl.md

Apr 23 '24 14:04 irexyc

@Halflifefa 打个广告：

LMDeploy 目前支持 InternVL之前发布的6个VL模型。1.5会在这周支持。

这是相关文档： https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/inference/vl_pipeline.md https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/serving/api_server_vl.md

请问会支持1.5的量化吗？

Apr 24 '24 04:04 douyh

kv cache 目前支持在线量化，设置一下engine config就可以了。

weight 量化的话，internvl-1.5 的 llm 部分用的 internlm2，这个我们是支持awq量化的，但是由于internvlchat的结构跟internlm2不同（多了一层language model，所以需要更新一下mapping，后面会支持起来，并且之后也会去兼容gptq的格式。

Apr 24 '24 05:04 irexyc

@Halflifefa 打个广告：

LMDeploy 目前支持 InternVL之前发布的6个VL模型。1.5会在这周支持。

这是相关文档： https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/inference/vl_pipeline.md https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/serving/api_server_vl.md

这个相比直接用pytorch推理，速度和显存有啥变化吗？

Apr 24 '24 09:04 xylcbd

@xylcbd

我们主要的卖点是LLM部分，这部分用的是用的自己写的引擎，该有的优化都有，对标的是vllm。

多模态的话，是顺便支持的，因为目前大部分多模态的结构，跟LLM一样，只是输入有一部分是视觉的embedding。视觉部分复用了模型本身的视觉模型(把LLM部分删掉了）。

LLM 部分显存，速度提升明显。视觉部分显存，速度应该跟pytorch差不多（也会组batch，但是发现提升并不明显）

Apr 24 '24 09:04 irexyc

加油最好能有量化一半或者四分之一的。

Apr 26 '24 08:04 sunjunlishi

@irexyc 感谢lmdeploy大佬的支持

Apr 26 '24 17:04 czczup

https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/inference/vl_pipeline.md 请问现在支持 1.5 了吗

Apr 28 '24 07:04 zhongtao93

@irexyc

Apr 28 '24 07:04 zhongtao93

@zhongtao93

有个PR还没合入 https://github.com/InternLM/lmdeploy/pull/1490

不过合入之后，需要自己编译代码，或者拷贝最新的python代码替换掉安装目录中的代码。

Apr 28 '24 07:04 irexyc

@Halflifefa 打个广告：

LMDeploy 目前支持 InternVL之前发布的6个VL模型。1.5会在这周支持。

这是相关文档： https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/inference/vl_pipeline.md https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/serving/api_server_vl.md

您好请问1.5现在已经支持了吗

Apr 30 '24 02:04 ruifengma

您好请问1.5现在已经支持了吗

支持了，现在用的话，可以自己编译，或者等我们5月8号发版

Apr 30 '24 02:04 irexyc

您好请问1.5现在已经支持了吗

支持了，现在用的话，可以自己编译，或者等我们5月8号发版

多谢～

Apr 30 '24 02:04 ruifengma

install 0.4.0加替换最新master 代码，已经跑通1.5了，想问下 https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5-Int8 这个量化可以直接用吗，还是需要做适配

May 07 '24 06:05 zhongtao93

@zhongtao93

不能直接用。这里有一些说明可以参考 https://github.com/InternLM/lmdeploy/issues/1522#issuecomment-2084929542

lmdeploy 支持vlm的量化工具会在月底支持。

btw，我觉得 OpenGVLab/InternVL-Chat-V1-5-Int8 这个模型没有存在的必要。他不是awq/gptq这种离线量化，只是借助bitsandbytes 做在线量化，完全可以加载fp模型的时候传个load_in_8bit/load_in_4bit参数，没必要去存储序列化模型。

May 07 '24 07:05 irexyc

您好请问1.5现在已经支持了吗

支持了，现在用的话，可以自己编译，或者等我们5月8号发版

怎么跑在多卡上呢，单卡显存容量不够，提示oom了

May 08 '24 04:05 winca

The 4-bit version of the model has been released. Check it out at OpenGVLab/InternVL-Chat-V1-5-AWQ.

Thanks to the lmdeploy team for their support with model quantization. I'm closing this issue now, but if you encounter any problems, please don't hesitate to reopen it.

May 30 '24 12:05 czczup