inference-acceleration topics

126

Stars

10

Forks

Watchers

⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.

146

Stars

8

Forks

Watchers

[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

32

Stars

1

Forks

Watchers

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"