gptq topic
neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
ialacol
🪶 Lightweight OpenAI drop-in replacement for Kubernetes
LLaMA-Cult-and-More
Large Language Models for All, 🦙 Cult and More, Stay in touch !
xllm
🦖 X—LLM: Cutting Edge & Easy LLM Finetuning
gptq_for_langchain
A guide about how to use GPTQ models with langchain
llm-api
Run any Large Language Model behind a unified API
auto-round
Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"