kv-cache topic
godis
A Golang implemented Redis Server and Cluster. Go 语言实现的 Redis 服务器和分布式集群
cappr
Completion After Prompt Probability. Make your LLM make a choice
H2O
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
pytorch-llama-notes
Notes about LLaMA 2 model
EasyKV
Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)
LLaMA2
This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture...
LMCache
Supercharge Your LLM with the Fastest KV Cache Layer
blackbird
A high-performance RDMA distributed file system for fast LLM Inference and GPU Training
Deepdive-llama3-from-scratch
Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.
KVCache-Factory
Unified KV Cache Compression Methods for Auto-Regressive Models