kvcached

kvcached copied to clipboard

Published 1 month ago •

Reame
Issues

📈 Roadmap

Open ivanium opened this issue 3 months ago • 0 comments

🎯 Q2 2025

[X] Command line tools to check/configure physical memory usage/limit of each running instance
[X] Support Tensor parallelism
[X] Performance optimizations for physical memory management
[X] Transparent inference engine integration

🚀 Q3 2025

[ ] Integration with Llama.cpp (and hence Ollama and LM Studio)
[ ] Support pipeline parallelism
[ ] Support various attention types (sliding window attention, linear attention, vision encoder, etc.)

🌟 Q4 2025

[ ] Support prefix caching
[ ] Support KV cache offloading to host memory
[ ] Support AMD GPUs and Intel GPUs

Sep 17 '25 00:09 ivanium