awq topic
List
awq repositories
neural-compressor
2.0k
Stars
241
Forks
24
Watchers
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
Awesome-LLM-Inference
1.5k
Stars
118
Forks
Watchers
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
swift
1.7k
Stars
168
Forks
12
Watchers
ms-swift: Use PEFT or Full-parameter to finetune 250+ LLMs or 25+ MLLMs
auto-round
81
Stars
9
Forks
Watchers
SOTA Weight-only Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"