fast-inference topic

List fast-inference repositories

pytorch-slimming

562

Stars

95

Forks

Watchers

Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.

l1-regularization

BigLittleDecoder

85

Stars

10

Forks

Watchers

[NeurIPS'23] Speculative Decoding with Big Little Decoder

efficient-inference

Q-LLM

32

Stars

1

Forks

Watchers

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

inference-acceleration

kv-cache-compression

large-language-models

flux-fp8-api

264

Stars

37

Forks

Watchers

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

Speculative-Decoding

18

Stars

0

Forks

Watchers

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.

llm-optimization