lmdeploy icon indicating copy to clipboard operation
lmdeploy copied to clipboard

Decoupled Vision-Language Deployment support?

Open happened opened this issue 4 months ago • 5 comments

https://arxiv.org/abs/2508.18265

how to use dvd , does lmdeploy support it?

happened avatar Aug 28 '25 07:08 happened

Hi, the DvD reported in the InternVL3.5 does not use LMDeploy as the inference backend. But we plan to support similar features in LMDeploy, please stay tuned.

CUHKSZzxy avatar Aug 28 '25 08:08 CUHKSZzxy

Will there be a possibility to support decoupled vision-language deployment in vllm or sglang? Is there a roadmap for when the flash version will be released? is it possible use fp8 instead bf16 in decoupled vision-language deployment

eren-ay avatar Aug 31 '25 11:08 eren-ay

Will there be a possibility to support decoupled vision-language deployment in vllm or sglang? Is there a roadmap for when the flash version will be released? is it possible use fp8 instead bf16 in decoupled vision-language deployment

  1. We will support decoupled vision-language deployment in LMDeploy, not in vllm / sglang. For vllm / sglang DvD support, you may consult the vLLM / SGLang team for help.
  2. We expect to provide a draft version in September. Please stay tuned.
  3. We will first consider bf16, but I think DvD is independent of the precision format. Whether the model uses bf16 or fp8 depends on the model weights themselves.

CUHKSZzxy avatar Sep 01 '25 02:09 CUHKSZzxy

When DVD support arrives, will flash versions of Internvl3_5 also be released? Will Lmdeploy support the flash version when running with DVD? It was written that the 3_5 flash patch router will determine the compression level. Will we be able to control the compression of this patch router via Lmdeploy? I really want to know is, will I be able to use the DVD and flash version with Lmdeploy to get the maximum speed?

eren-ay avatar Sep 03 '25 00:09 eren-ay

When DVD support arrives, will flash versions of Internvl3_5 also be released? Will Lmdeploy support the flash version when running with DVD? It was written that the 3_5 flash patch router will determine the compression level. Will we be able to control the compression of this patch router via Lmdeploy? I really want to know is, will I be able to use the DVD and flash version with Lmdeploy to get the maximum speed?

  1. As reported by the IntenVL Team in the documentation

The Flash version of our model will be released as soon as possible.

And once it has been released, LMDeploy will support it as fast as we can.

  1. Yes, you will be able to use LMDeploy to achieve the maximum speed for the InternVL series. But since the Flash version model weights have not been open-sourced yet (which should contain the trained router weights as well), please wait for related updates.

CUHKSZzxy avatar Sep 03 '25 12:09 CUHKSZzxy