gptq topic
neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
ialacol
🪶 Lightweight OpenAI drop-in replacement for Kubernetes
LLaMA-Cult-and-More
Large Language Models for All, 🦙 Cult and More, Stay in touch !
xllm
🦖 X—LLM: Cutting Edge & Easy LLM Finetuning
gptq_for_langchain
A guide about how to use GPTQ models with langchain
llm-api
Run any Large Language Model behind a unified API
auto-round
Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
GPTQModel
LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.
Aris-AI-Model-Server
An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API