fast-inference topic

List fast-inference repositories

pytorch-slimming

562
Stars
95
Forks
Watchers

Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.

BigLittleDecoder

85
Stars
10
Forks
Watchers

[NeurIPS'23] Speculative Decoding with Big Little Decoder

Q-LLM

32
Stars
1
Forks
Watchers

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

flux-fp8-api

264
Stars
37
Forks
Watchers

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

Speculative-Decoding

18
Stars
0
Forks
Watchers

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.