efficient-inference topic

List efficient-inference repositories

Multistage_Pruning

16
Stars
3
Forks
Watchers

Cheng-Hao Tu, Jia-Hong Lee, Yi-Ming Chan and Chu-Song Chen, "Pruning Depthwise Separable Convolutions for MobileNet Compression," International Joint Conference on Neural Networks, IJCNN 2020, July 20...

KVQuant

286
Stars
25
Forks
Watchers

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

fast_robust_early_exit

50
Stars
7
Forks
Watchers

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)

picollm

162
Stars
6
Forks
Watchers

On-device LLM Inference Powered by X-Bit Quantization

AsyncDiff

146
Stars
8
Forks
Watchers

[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

AutoVP

16
Stars
2
Forks
Watchers

[ICLR24] AutoVP: An Automated Visual Prompting Framework and Benchmark