MIT HAN Lab
MIT HAN Lab
TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
efficientvit
EfficientViT is a new family of vision models for efficient high-resolution vision.
spatten
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
offsite-tuning
Offsite-Tuning: Transfer Learning without Full Model
distrifuser
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
flatformer
[CVPR'23] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
llm-awq
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
sparsevit
[CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks