grpo topic
ms-swift
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi...
ART
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!
OpenThinkIMG
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
Travel-Agent-based-on-Qwen2-RLHF
A travel agent based on Qwen2.5, fine-tuned by SFT + DPO/PPO/GRPO using traveling question-answer dataset, a mindmap can be output using the response. A RAG system is build upon the tuned qwen2, using...
AgentGuide
https://adongwanai.github.io/AgentGuide | AI Agent开发指南 | LangGraph实战 | 高级RAG | 转行大模型 | 大模型面试 | 算法工程师 | 面试题库 | 强化学习|数据合成
Vision-SR1
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
vlm-grpo
An implementation of GRPO for Unsloth's VLMs training
judgeval
The open source post-building layer for agents. Our environment data and evals power agent post-training (RL, SFT) and monitoring.
VisualThinker-R1-Zero
Explore the Multimodal “Aha Moment” on 2B Model
awesome-deep-reasoning
Collect every awesome work about r1!