awq topics

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

DefTruth

awq

ee-llm

flash-attention

flash-attention-2

Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Visio...

modelscope

agent

aigc

baichuan

chatglm

auto-round

222

Stars

19

Forks

Watchers

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"

intel

awq

gptq

int4

neural-compressor

llmc

308

Stars

32

Forks

Watchers

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

ModelTC

benchmark

deployment

evaluation

large-language-models

awq topic

neural-compressor

Awesome-LLM-Inference

ms-swift

auto-round

llmc