grpo topic

List grpo repositories

ms-swift

11.9k
Stars
1.1k
Forks
11.9k
Watchers

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi...

ART

8.1k
Stars
644
Forks
8.1k
Watchers

Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!

OpenThinkIMG

324
Stars
6
Forks
324
Watchers

OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.

Travel-Agent-based-on-Qwen2-RLHF

36
Stars
2
Forks
36
Watchers

A travel agent based on Qwen2.5, fine-tuned by SFT + DPO/PPO/GRPO using traveling question-answer dataset, a mindmap can be output using the response. A RAG system is build upon the tuned qwen2, using...

AgentGuide

264
Stars
25
Forks
264
Watchers

https://adongwanai.github.io/AgentGuide | AI Agent开发指南 | LangGraph实战 | 高级RAG | 转行大模型 | 大模型面试 | 算法工程师 | 面试题库 | 强化学习|数据合成

Vision-SR1

142
Stars
18
Forks
142
Watchers

Reinforcement Learning of Vision Language Models with Self Visual Perception Reward

vlm-grpo

78
Stars
7
Forks
78
Watchers

An implementation of GRPO for Unsloth's VLMs training

judgeval

1.0k
Stars
87
Forks
1.0k
Watchers

The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.

VisualThinker-R1-Zero

620
Stars
23
Forks
620
Watchers

Explore the Multimodal “Aha Moment” on 2B Model

awesome-deep-reasoning

427
Stars
15
Forks
427
Watchers

Collect every awesome work about r1!