kv-cache-compression topic

List kv-cache-compression repositories

Q-LLM

55
Stars
5
Forks
55
Watchers

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

Context-Memory

62
Stars
3
Forks
62
Watchers

Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)

xKV

42
Stars
4
Forks
42
Watchers

xKV: Cross-Layer SVD for KV-Cache Compression

KVCache-Factory

1.3k
Stars
159
Forks
1.3k
Watchers

Unified KV Cache Compression Methods for Auto-Regressive Models

kvpress

726
Stars
83
Forks
726
Watchers

LLM KV cache compression made easy

Awesome-LLM-KV-Cache

404
Stars
25
Forks
404
Watchers

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

KVzip

169
Stars
8
Forks
169
Watchers

[NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)

block-transformer

162
Stars
9
Forks
162
Watchers

Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)

Palu

150
Stars
12
Forks
150
Watchers

[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection

SCOPE

33
Stars
3
Forks
33
Watchers

(ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation