InternVL InternVL2-40B模型出现输出混乱的现象

启动方式： lmdeploy serve api_server /root/wangjianqiang/InternLM/OpenGVLab/InternVL2-40B-1/ --server-name 0.0.0.0 --server-port 9014 --model-name internVL --tp 4 --log-level INFO --backend turbomind --chat-template /root/wangjianqiang/InternLM/OpenGVLab/chat_template.json

调用方式使用的openai方式

输出结果：总总总总总总总总总总总总总总总总总总总总总 specially subscribers subscribers subscribers temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper temper（后面还有很长一串）

是模型出现问题了吗？还是配置不正确

Jul 15 '24 08:07 WangJianQ-0118

老哥，你的40B模型部署需要多少显存

Jul 15 '24 09:07 SeanWu999

我也遇见了这个问题，使用的是lmdeploy pipeline调用方式。换到26B的模型就可以正常输出

Jul 15 '24 11:07 Xu-Jianjun

老哥，你的40B模型部署需要多少显存

老哥，可以看看这里 https://github.com/Czi24/Awesome-MLLM-LLM-Colab/blob/master/MLLM/InternVL-colab/InternVL.md

Jul 15 '24 15:07 Czi24

老哥，你的40B模型部署需要多少显存

老哥，可以看看这里 https://github.com/Czi24/Awesome-MLLM-LLM-Colab/blob/master/MLLM/InternVL-colab/InternVL.md

问一下这里面的显存是模型刚加载好的显存还是推理达到max_tokens时的显存，大概算了一下是刚加载好的显存？

Jul 20 '24 16:07 wciq1208

老哥，你的40B模型部署需要多少显存

老哥，可以看看这里 https://github.com/Czi24/Awesome-MLLM-LLM-Colab/blob/master/MLLM/InternVL-colab/InternVL.md

问一下这里面的显存是模型刚加载好的显存还是推理达到max_tokens时的显存，大概算了一下是刚加载好的显存？

您好，这里的显存是加载好模型之后，简单跑了几次image caption之后的显存，没有推理到max_tokens；如果到了max_tokens，应该还会再额外多占用一些显存。

Sep 06 '24 14:09 czczup

Hi, since there hasn't been any recent activity on this issue, I'll be closing it for now. If it's still an active concern, don't hesitate to reopen it. Thanks for your understanding!

Dec 09 '24 11:12 czczup