inference-acceleration topic

List inference-acceleration repositories

nos

126
Stars
10
Forks
Watchers

⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.

AsyncDiff

146
Stars
8
Forks
Watchers

[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

Q-LLM

32
Stars
1
Forks
Watchers

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"