Decoupled Vision-Language Deployment support?
https://arxiv.org/abs/2508.18265
how to use dvd , does lmdeploy support it?
Hi, the DvD reported in the InternVL3.5 does not use LMDeploy as the inference backend. But we plan to support similar features in LMDeploy, please stay tuned.
Will there be a possibility to support decoupled vision-language deployment in vllm or sglang? Is there a roadmap for when the flash version will be released? is it possible use fp8 instead bf16 in decoupled vision-language deployment
Will there be a possibility to support decoupled vision-language deployment in vllm or sglang? Is there a roadmap for when the flash version will be released? is it possible use fp8 instead bf16 in decoupled vision-language deployment
- We will support decoupled vision-language deployment in LMDeploy, not in vllm / sglang. For vllm / sglang DvD support, you may consult the vLLM / SGLang team for help.
- We expect to provide a draft version in September. Please stay tuned.
- We will first consider bf16, but I think DvD is independent of the precision format. Whether the model uses bf16 or fp8 depends on the model weights themselves.
When DVD support arrives, will flash versions of Internvl3_5 also be released? Will Lmdeploy support the flash version when running with DVD? It was written that the 3_5 flash patch router will determine the compression level. Will we be able to control the compression of this patch router via Lmdeploy? I really want to know is, will I be able to use the DVD and flash version with Lmdeploy to get the maximum speed?
When DVD support arrives, will flash versions of Internvl3_5 also be released? Will Lmdeploy support the flash version when running with DVD? It was written that the 3_5 flash patch router will determine the compression level. Will we be able to control the compression of this patch router via Lmdeploy? I really want to know is, will I be able to use the DVD and flash version with Lmdeploy to get the maximum speed?
- As reported by the IntenVL Team in the documentation
The Flash version of our model will be released as soon as possible.
And once it has been released, LMDeploy will support it as fast as we can.
- Yes, you will be able to use LMDeploy to achieve the maximum speed for the InternVL series. But since the Flash version model weights have not been open-sourced yet (which should contain the trained router weights as well), please wait for related updates.