int4 topic

List int4 repositories

how-to-optimize-gemm

583
Stars
78
Forks
Watchers

row-major matmul optimization

neural-compressor

2.2k
Stars
254
Forks
24
Watchers

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

neural-speed

346
Stars
38
Forks
Watchers

An innovative library for efficient LLM inference via low-bit quantization

auto-round

222
Stars
19
Forks
Watchers

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"