efficient-inference topics

Multistage_Pruning

16

Stars

3

Forks

Watchers

Cheng-Hao Tu, Jia-Hong Lee, Yi-Ming Chan and Chu-Song Chen, "Pruning Depthwise Separable Convolutions for MobileNet Compression," International Joint Conference on Neural Networks, IJCNN 2020, July 20...

ivclab

channel-pruning

deep-neural-networks

depthwise-separable-convolutions

efficient-inference

KVQuant

286

Stars

25

Forks

Watchers

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

SqueezeAILab

compression

efficient-inference

efficient-model

large-language-models

fast_robust_early_exit

50

Stars

7

Forks

Watchers

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)

raymin0223

autoregressive-models

early-exiting

efficient-inference

llms

picollm

273

Stars

15

Forks

273

Watchers

On-device LLM Inference Powered by X-Bit Quantization

Picovoice

compression

efficient-inference

gemma

generative-ai

AsyncDiff

146

Stars

8

Forks

Watchers

[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

czg1225

diffusion-models

distributed-computing

efficient-inference

inference-acceleration

AutoVP

16

Stars

2

Forks

Watchers

[ICLR24] AutoVP: An Automated Visual Prompting Framework and Benchmark

IBM

downstream-tasks

efficient-inference

finetuning

foundation-models

VidCom2

38

Stars

1

Forks

38

Watchers

[EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models

xuyang-liu16

efficient-inference

large-language-model

large-vision-language-models

video-large-language-models

DGMR

21

Stars

0

Forks

21

Watchers

The official implementation of "Diversity-Guided MLP Reduction for Efficient Large Vision Transformers"

visresearch

clip

compression

distillation

efficient-inference

LLaVA-STF

29

Stars

2

Forks

29

Watchers

The official implementation of "Learning Compact Vision Tokens for Efficient Large Multimodal Models"

visresearch

efficient-deep-learning

efficient-inference

large-multimodal-models

large-vision-language-models