kv-cache topics

This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture...

aju22

attention

gpt

kv-cache

llama

LMCache

4.8k

Stars

526

Forks

Watchers

Supercharge Your LLM with the Fastest KV Cache Layer

LMCache

amd

cuda

fast

inference

blackbird

39

Stars

4

Forks

39

Watchers

A high-performance RDMA distributed file system for fast LLM Inference and GPU Training

blackbird-io

big-data

cpp

cuda

distributed-cache

Deepdive-llama3-from-scratch

612

Stars

50

Forks

612

Watchers

Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.

therealoliver

attention

attention-mechanism

gpt

inference

KVCache-Factory

1.3k

Stars

159

Forks

1.3k

Watchers

Unified KV Cache Compression Methods for Auto-Regressive Models

Zefan-Cai

kv-cache

kv-cache-compression

llm