lmdeploy
lmdeploy copied to clipboard
[Feature] Wish to optimize prefix caching support for VLM
Motivation
vllm has established a V1 version, which supports prefix caching in multi modality llm. As a comparable infer engine,Wish LMDeploy has comparable feature :)
Related resources
No response
Additional context
No response
@torinchen hi, thanks for your attention. This feature is in progress. BTW, can you share which VLM you intend to use with prefix caching? THX
internvl2_5 and QwenVL serials