Add support for prompt caching (image + video)

Open Blaizzy opened this issue 1 year ago • 1 comments

Dec 15 '24 16:12 Blaizzy

To solve #136 We can load KV cache (attention sink) and compute values for new image and prompt tokens.

Note:

https://docs.vllm.ai/en/latest/automatic_prefix_caching/apc.html

https://www.restack.io/p/large-language-models-answer-prefix-caching-cat-ai

Dec 30 '24 02:12 Blaizzy