kvcached icon indicating copy to clipboard operation
kvcached copied to clipboard

📈 Roadmap

Open ivanium opened this issue 3 months ago • 0 comments

🎯 Q2 2025

  • [X] Command line tools to check/configure physical memory usage/limit of each running instance
  • [X] Support Tensor parallelism
  • [X] Performance optimizations for physical memory management
  • [X] Transparent inference engine integration

🚀 Q3 2025

  • [ ] Integration with Llama.cpp (and hence Ollama and LM Studio)
  • [ ] Support pipeline parallelism
  • [ ] Support various attention types (sliding window attention, linear attention, vision encoder, etc.)

🌟 Q4 2025

  • [ ] Support prefix caching
  • [ ] Support KV cache offloading to host memory
  • [ ] Support AMD GPUs and Intel GPUs

ivanium avatar Sep 17 '25 00:09 ivanium