grpo topics

ms-swift

11.9k

Stars

1.1k

Forks

11.9k

Watchers

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi...

modelscope

agent

aigc

baichuan

chatglm

ART

8.1k

Stars

644

Forks

8.1k

Watchers

Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!

OpenPipe

agent

agentic-ai

grpo

kimi-ai

OpenThinkIMG

324

Stars

6

Forks

324

Watchers

OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.

zhaochen0110

grpo

lvlm

reinforcement-learning

vision-tool

Travel-Agent-based-on-Qwen2-RLHF

36

Stars

2

Forks

36

Watchers

A travel agent based on Qwen2.5, fine-tuned by SFT + DPO/PPO/GRPO using traveling question-answer dataset, a mindmap can be output using the response. A RAG system is build upon the tuned qwen2, using...

NJUxlj

agent

dpo

grpo

langchain

427

Stars

15

Forks

427

Watchers

Collect every awesome work about r1!

modelscope

collection

deepseek

grpo

o1