awesome-deep-reasoning icon indicating copy to clipboard operation
awesome-deep-reasoning copied to clipboard

Collect every awesome work about r1!

Awesome-deep-reasoning

Collect the awesome works evolved around reasoning models like O1/R1! You can also find the collection ModelScope-r1-collection | HuggingFace-r1-collection

Table of Contents

  • News
  • Highlights
  • Papers
  • Models
  • Infra
  • Datasets
  • Evaluation
  • RelatedRepos

News

  • 🔥 [2025.04.23] Add section "Advanced Reasoning for Agent", including Search-R1, Re-Search, R1-Searcher, ...
  • 🔥 [2025.03.21] Add DAPO - DAPO: An Open-Source LLM Reinforcement Learning System at Scale
  • 🔥 [2025.03.18] Add Skywork-R1V - Pioneering Multimodal Reasoning with CoT
  • 🔥 [2025.03.17] Add START: Self-taught Reasoner with Tools from Qwen Team - START
  • 🔥 [2025.03.12] Add Multi-modal Reasoning datasets: LLaVA-R1-100k and MMMU-Reasoning-R1-Distill-Validation
  • 🔥 [2025.03.04] Add the Visual-RFT - Visual Reinforcement Fine-Tuning
  • 🔥 [2025.03.01] DeepSeek has released the smallpond - A lightweight data processing framework built on DuckDB and 3FS.
  • 🔥 [2025.02.28] DeepSeek has released the 3FS - A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
  • 🔥 [2025.02.27] DeepSeek has released the DualPipe - DualPipe achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles.
  • 🔥 [2025.02.27] DeepSeek has released the ProfileData -The communication-computation overlap profiling strategies and low-level implementation details based on PyTorch.
  • 🔥 [2025.02.26] DeepSeek has released the DeepGEMM - Clean and efficient FP8 GEMM kernels with fine-grained scaling
  • OpenAI publishes a deep-research capability.
  • OpenAI has launched the latest o3 model: o3-mini & o3-mini-high, which specifically support science, math and coding. These two models are available in ChatGPT App, Poe, etc.
  • NVIDIA-NIM has supported the DeepSeek-R1 model.
  • Qwen has launched a powerful multi-modal MoE model: Qwen2.5-Max, this model is available in the Bailian platform.
  • CodeGPT: VSCode co-pilot now supports R1.

Highlights

DeepSeek repos:

DeepSeek-R1 Stars - DeepSeek-R1 official repository.

Qwen repos:

Qwen-QwQ Stars - Qwen 2.5 official repository, with QwQ.

S1 from stanford - From Feifei Li team, a distillation and test-time compute impl which can match the performance of O1 and R1.

Papers

2025.04

  • ReSearch - ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
  • Search-R1 - Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
  • R1-Searcher - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

2025.03

2025.02

2025.01

2024

Blogs

Models

DeepSeek series:

Model ID ModelScope Hugging Face
DeepSeek R1 Model Link Model Link
DeepSeek V3 Model Link Model Link
DeepSeek-R1-Distill-Qwen-32B Model Link Model Link
DeepSeek-R1-Distill-Qwen-14B Model Link Model Link
DeepSeek-R1-Distill-Llama-8B Model Link Model Link
DeepSeek-R1-Distill-Qwen-7B Model Link Model Link
DeepSeek-R1-Distill-Qwen-1.5B Model Link Model Link
DeepSeek-R1-GGUF Model Link Model Link
DeepSeek-R1-Distill-Qwen-32B-GGUF Model Link Model Link
DeepSeek-R1-Distill-Llama-8B-GGUF Model Link Model Link

Qwen series:

Model ID ModelScope Hugging Face
QwQ-32B-Preview Model Link Model Link
QVQ-72B-Preview Model Link Model Link
QwQ-32B-Preview-GGUF Model Link Model Link
QVQ-72B-Preview-bnb-4bit Model Link Model Link

Others:

Model ID ModelScope Hugging Face
Qwen2-VL-2B-GRPO-8k - Model Link

Infra

  • Flash MLA [DeepSeek]: https://github.com/deepseek-ai/FlashMLA
    • FlashMLA is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences serving.
  • Open R1 by Hugging Face: https://github.com/huggingface/open-r1
    • This repo is the official repo of Hugging Face to reproduce the training infra of DeepSeek-R1
  • TinyZero: https://github.com/Jiayi-Pan/TinyZero
    • Clean, minimal, accessible reproduction of DeepSeek R1-Zero
  • SimpleRL-Reason: https://github.com/hkust-nlp/simpleRL-reason
    • Use OpenRLHF to reproduce DeepSeek-R1
  • Ragen: https://github.com/ZihanWang314/RAGEN
    • A General-Purpose Reasoning Agent Training Framework and reproduce DeepSeek-R1
  • TRL: https://github.com/huggingface/trl
    • Hugging Face official training framework which supports open-source GRPO and other RL algorithms.
  • OpenRLHF: https://github.com/OpenRLHF/OpenRLHF
    • An RL repo which supports RLs(supports REINFORCE++)
  • Logic-RL: https://github.com/Unakar/Logic-RL
  • Align-Anything: https://github.com/PKU-Alignment/align-anything
    • Training All-modality Model with Feedback
  • R-Chain: A lightweight toolkit for distilling reasoning models
    • https://github.com/modelscope/r-chain
  • Math Verify: A robust mathematical expression evaluation system designed for assessing Large Language Model outputs in mathematical tasks.
    • https://github.com/huggingface/Math-Verify
  • EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
    • https://github.com/hiyouga/EasyR1
  • DeepGEMM - [DeepSeek] Clean and efficient FP8 GEMM kernels with fine-grained scaling
  • DualPipe - [DeepSeek] DualPipe achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles.
  • ProfileData - [DeepSeek] The communication-computation overlap profiling strategies and low-level implementation details based on PyTorch.
  • 3FS - [DeepSeek] A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
  • smallpond - [DeepSeek] A lightweight data processing framework built on DuckDB and 3FS.

Datasets

Evaluation

RelatedRepos

Replicates of DeepSeek-R1 and DeepSeek-R1-Zero

  1. HuggingFace Open R1
  2. Simple Reinforcement Learning for Reasoning
  3. oatllm
  4. TinyZero
  5. 32B-DeepSeek-R1-Zero
  6. X-R1
  7. Open-Reasoner-Zero
  8. Logic-RL - Reproduce R1 Zero on Logic Puzzle

Advanced Reasoning for Coding

  1. SWE-RL - Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Advanced Reasoning for Multi-Modal

  1. R1-V - Multi-modal R1
  2. Open-R1-Multimodal - A multimodal reasoning model based on OpenR1
  3. R1-Multimodal-Journey - A journey to replicate multimodal reasoning model based on Open-R1-Multimodal
  4. VLM-R1 | DEMO - A stable and generalizable R1-style Large Vision-Language Model
  5. Video-R1 - Towards Super Reasoning Ability in Video Understanding MLLMs
  6. VL-Thinking - An R1-Derived Visual Instruction Tuning Dataset for Thinkable LVLMs
  7. Open-R1-Multimodal - A fork to add multimodal model training to open-r1
  8. Visual-RFT - Visual Reinforcement Fine-Tuning
  9. Skywork-R1V
  10. R1-Omni - Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning
  11. R1-OneVision - A visual language model capable of deep CoT reasoning

Advanced Reasoning for Agent

  1. Search-R1 - An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
  2. ReSearch - ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
  3. R1-Searcher - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
  4. UI-TARS - Pioneering Automated GUI Interaction with Native Agents

Star History

Star History Chart