vllm topics

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

DefTruth

awq

ee-llm

flash-attention

flash-attention-2

Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Visio...

modelscope

agent

aigc

baichuan

chatglm

llm-vscode-inference-server

52

Stars

8

Forks

Watchers

An endpoint server for efficiently serving quantized open-source LLMs for code.

wangcx18

llm

llm-inference

vllm

vscode-extension

OpenRLHF

2.1k

Stars

206

Forks

Watchers

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)

OpenRLHF

deepspeed

llm

ray

rlhf

booster

137

Stars

6

Forks

Watchers

Booster - open accelerator for LLM models. Better inference and debugging for AI hackers

gotzmann

alpaca

chatgpt

exllama

ggml

super-json-mode

382

Stars

12

Forks

Watchers

Low latency JSON generation using LLMs ⚡️

varunshenoy

huggingface-transformers

llm

openai

vllm

llama-recipes

14.8k

Stars

2.1k

Forks

193

Watchers

Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a...

meta-llama

ai

finetuning

langchain

llama