inference-acceleration topic
List
inference-acceleration repositories
nos
126
Stars
10
Forks
Watchers
⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
AsyncDiff
146
Stars
8
Forks
Watchers
[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Q-LLM
32
Stars
1
Forks
Watchers
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
twigvlm
20
Stars
2
Forks
20
Watchers
Implementation of ICCV 2025 paper "Growing a Twig to Accelerate Large Vision-Language Models".
TeaCache
1.2k
Stars
48
Forks
1.2k
Watchers
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model