kvcached
kvcached copied to clipboard
📈 Roadmap
🎯 Q2 2025
- [X] Command line tools to check/configure physical memory usage/limit of each running instance
- [X] Support Tensor parallelism
- [X] Performance optimizations for physical memory management
- [X] Transparent inference engine integration
🚀 Q3 2025
- [ ] Integration with Llama.cpp (and hence Ollama and LM Studio)
- [ ] Support pipeline parallelism
- [ ] Support various attention types (sliding window attention, linear attention, vision encoder, etc.)
🌟 Q4 2025
- [ ] Support prefix caching
- [ ] Support KV cache offloading to host memory
- [ ] Support AMD GPUs and Intel GPUs