mlx-vlm icon indicating copy to clipboard operation
mlx-vlm copied to clipboard

Add support for prompt caching (image + video)

Open Blaizzy opened this issue 1 year ago • 1 comments

Blaizzy avatar Dec 15 '24 16:12 Blaizzy

To solve #136 We can load KV cache (attention sink) and compute values for new image and prompt tokens.

Note:

  1. Save each image hash in the cached prompts/features for efficient loading.
  2. Prefix caching (to investigate)

https://docs.vllm.ai/en/latest/automatic_prefix_caching/apc.html

https://www.restack.io/p/large-language-models-answer-prefix-caching-cat-ai

Blaizzy avatar Dec 30 '24 02:12 Blaizzy