mlx-vlm
mlx-vlm copied to clipboard
Add support for prompt caching (image + video)
To solve #136 We can load KV cache (attention sink) and compute values for new image and prompt tokens.
Note:
- Save each image hash in the cached prompts/features for efficient loading.
- Prefix caching (to investigate)
https://docs.vllm.ai/en/latest/automatic_prefix_caching/apc.html
https://www.restack.io/p/large-language-models-answer-prefix-caching-cat-ai