efficient-inference topic
graphless-neural-networks
[ICLR 2022] Code for Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation (GLNN)
SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
DeepCache
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
LLMCompiler
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
speculative-decoding
Explorations into some recent techniques surrounding speculative decoding
lzu
Code for Learning to Zoom and Unzoom (CVPR 2023)
TinyML-Benchmark-NNs-on-MCUs
Code for WF-IoT paper 'TinyML Benchmark: Executing Fully Connected Neural Networks on Commodity Microcontrollers'
triple-wins
[ICLR 2020] ”Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference“
LightGaussian
[NeurIPS 2024 Spotlight]"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS", Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang