post-training-quantization topic
Adventures-in-TensorFlow-Lite
This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.
micronet
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Ar...
TinyNeuralNetwork
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Sparsebit
A model compression and acceleration toolbox based on pytorch.
FQ-ViT
[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
static_quantization
Post-training static quantization using ResNet18 architecture
SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
q-diffusion
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
QLLM
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"