awesome-ml-model-compression icon indicating copy to clipboard operation
awesome-ml-model-compression copied to clipboard

Awesome machine learning model compression research papers, tools, and learning material.

Awesome ML Model Compression Awesome

An awesome style list that curates the best machine learning model compression and acceleration research papers, articles, tutorials, libraries, tools and more. PRs are welcome!

Contents

  • Papers
    • General
    • Architecture
    • Quantization
    • Binarization
    • Pruning
    • Distillation
    • Low Rank Approximation
  • Articles
    • Howtos
    • Assorted
    • Reference
    • Blogs
  • Tools
    • Libraries
    • Frameworks
  • Videos
    • Talks
    • Training & tutorials

Papers

General

Architecture

Quantization

Binarization

Pruning

Distillation

Low Rank Approximation

Articles

Content published on the Web.

Howtos

Assorted

Reference

Blogs

Tools

Libraries

  • TensorFlow Model Optimization Toolkit. Accompanied blog post, TensorFlow Model Optimization Toolkit — Pruning API
  • XNNPACK is a highly optimized library of floating-point neural network inference operators for ARM, WebAssembly, and x86 (SSE2 level) platforms. It's a based on QNNPACK library. However, unlike QNNPACK, XNNPACK focuses entirely on floating-point operators.
  • Bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers and quantization functions.
  • NNCP - An experiment to build a practical lossless data compressor with neural networks. The latest version uses a Transformer model (slower but best ratio). LSTM (faster) is also available.

Frameworks

Paper Implementations

  • facebookresearch/kill-the-bits - code and compressed models for the paper, "And the bit goes down: Revisiting the quantization of neural networks" by Facebook AI Research.

Videos

Talks

Training & tutorials

License

CC0

To the extent possible under law, Cedric Chee has waived all copyright and related or neighboring rights to this work.