fast-inference topic
List
fast-inference repositories
pytorch-slimming
562
Stars
95
Forks
Watchers
Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
BigLittleDecoder
85
Stars
10
Forks
Watchers
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Q-LLM
32
Stars
1
Forks
Watchers
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
flux-fp8-api
264
Stars
37
Forks
Watchers
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
Speculative-Decoding
18
Stars
0
Forks
Watchers
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.