awq topic

List awq repositories

neural-compressor

2.0k
Stars
241
Forks
24
Watchers

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Awesome-LLM-Inference

1.5k
Stars
118
Forks
Watchers

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

swift

1.7k
Stars
168
Forks
12
Watchers

ms-swift: Use PEFT or Full-parameter to finetune 250+ LLMs or 25+ MLLMs

auto-round

81
Stars
9
Forks
Watchers

SOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"