efficient-inference topic

List efficient-inference repositories

Multistage_Pruning

16
Stars
3
Forks
Watchers

Cheng-Hao Tu, Jia-Hong Lee, Yi-Ming Chan and Chu-Song Chen, "Pruning Depthwise Separable Convolutions for MobileNet Compression," International Joint Conference on Neural Networks, IJCNN 2020, July 20...

KVQuant

286
Stars
25
Forks
Watchers

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

fast_robust_early_exit

50
Stars
7
Forks
Watchers

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)

picollm

273
Stars
15
Forks
273
Watchers

On-device LLM Inference Powered by X-Bit Quantization

AsyncDiff

146
Stars
8
Forks
Watchers

[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

AutoVP

16
Stars
2
Forks
Watchers

[ICLR24] AutoVP: An Automated Visual Prompting Framework and Benchmark

VidCom2

38
Stars
1
Forks
38
Watchers

[EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models

DGMR

21
Stars
0
Forks
21
Watchers

The official implementation of "Diversity-Guided MLP Reduction for Efficient Large Vision Transformers"

LLaVA-STF

29
Stars
2
Forks
29
Watchers

The official implementation of "Learning Compact Vision Tokens for Efficient Large Multimodal Models"