efficient-inference topics

Multistage_Pruning

16

Stars

3

Forks

Watchers

Cheng-Hao Tu, Jia-Hong Lee, Yi-Ming Chan and Chu-Song Chen, "Pruning Depthwise Separable Convolutions for MobileNet Compression," International Joint Conference on Neural Networks, IJCNN 2020, July 20...

ivclab

channel-pruning

deep-neural-networks

depthwise-separable-convolutions

efficient-inference

KVQuant

286

Stars

25

Forks

Watchers

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

SqueezeAILab

compression

efficient-inference

efficient-model

large-language-models

fast_robust_early_exit

50

Stars

7

Forks

Watchers

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)

raymin0223

autoregressive-models

early-exiting

efficient-inference

llms

picollm

162

Stars

6

Forks

Watchers

On-device LLM Inference Powered by X-Bit Quantization

Picovoice

compression

efficient-inference

gemma

generative-ai

AsyncDiff

146

Stars

8

Forks

Watchers

[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

czg1225

diffusion-models

distributed-computing

efficient-inference

inference-acceleration

AutoVP

16

Stars

2

Forks

Watchers

[ICLR24] AutoVP: An Automated Visual Prompting Framework and Benchmark

IBM

downstream-tasks

efficient-inference

finetuning

foundation-models