int4 topic

List int4 repositories

trafficstars

how-to-optimize-gemm

583

Stars

78

Forks

Watchers

row-major matmul optimization

neural-compressor

2.2k

Stars

254

Forks

Watchers

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

knowledge-distillation

neural-speed

346

Stars

38

Forks

Watchers

An innovative library for efficient LLM inference via low-bit quantization

auto-round

222

Stars

19

Forks

Watchers

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"

neural-compressor