kv-cache-compression topic
Q-LLM
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
Context-Memory
Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
xKV
xKV: Cross-Layer SVD for KV-Cache Compression
KVCache-Factory
Unified KV Cache Compression Methods for Auto-Regressive Models
kvpress
LLM KV cache compression made easy
Awesome-LLM-KV-Cache
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
KVzip
[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)
block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
Palu
[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection
SCOPE
(ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation