fp4 topic

List fp4 repositories

neural-compressor

2.0k
Stars
241
Forks
24
Watchers

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

neural-speed

273
Stars
31
Forks
Watchers

An innovative library for efficient LLM inference via low-bit quantization