gptq topic

List gptq repositories

neural-compressor

2.2k
Stars
254
Forks
Watchers

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

ialacol

142
Stars
17
Forks
Watchers

🪶 Lightweight OpenAI drop-in replacement for Kubernetes

LLaMA-Cult-and-More

446
Stars
24
Forks
Watchers

Large Language Models for All, 🦙 Cult and More, Stay in touch !

xllm

376
Stars
21
Forks
Watchers

🦖 X—LLM: Cutting Edge & Easy LLM Finetuning

gptq_for_langchain

40
Stars
9
Forks
Watchers

A guide about how to use GPTQ models with langchain

llm-api

158
Stars
25
Forks
Watchers

Run any Large Language Model behind a unified API

zero-lora

30
Stars
3
Forks
Watchers

zero零训练llm调参

auto-round

222
Stars
19
Forks
Watchers

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"

GPTQModel

902
Stars
130
Forks
902
Watchers

LLM model quantization (compression) toolkit with hw acceleration support for Nvidia CUDA, AMD ROCm, Intel XPU and Intel/AMD/Apple CPU via HF, vLLM, and SGLang.

Aris-AI-Model-Server

18
Stars
1
Forks
18
Watchers

An OpenAI Compatible API which integrates LLM, Embedding and Reranker. 一个集成 LLM、Embedding 和 Reranker 的 OpenAI 兼容 API