neural-networks-quantization-notes icon indicating copy to clipboard operation
neural-networks-quantization-notes copied to clipboard

Neural Networks Quantization

Linear Quantization

Post Training Quantization

Quantization-Aware Training

state of art

4-bits
2-bits

Anealing from Continue to Discrete

Non-Uniform Quantization

Sparsity and Quantization

Quantization Support in Libraries

Post Training Quantization

  • TensorRT
    • Per channel weight scale
    • Calibration: minimize KL Divergence
  • Tensorflow lite
    • Per channel weight scale
    • Calibration: min max??
  • TVM

LUT-based

https://arxiv.org/pdf/1906.04798.pdf