[Feature] Wish to optimize prefix caching support for VLM

Open torinchen opened this issue 9 months ago • 2 comments

Motivation

vllm has established a V1 version, which supports prefix caching in multi modality llm. As a comparable infer engine，Wish LMDeploy has comparable feature :)

Related resources

No response

Additional context

No response

Apr 07 '25 03:04 torinchen

@torinchen hi, thanks for your attention. This feature is in progress. BTW, can you share which VLM you intend to use with prefix caching? THX

Apr 07 '25 07:04 RunningLeon

internvl2_5 and QwenVL serials

Apr 08 '25 03:04 torinchen