Awesome-deep-reasoning

Collect the awesome works evolved around reasoning models like O1/R1! You can also find the collection ModelScope-r1-collection | HuggingFace-r1-collection

News
Highlights
Papers
Models
Infra
Datasets
Evaluation
RelatedRepos

News

🔥 [2025.04.23] Add section "Advanced Reasoning for Agent", including Search-R1, Re-Search, R1-Searcher, ...
🔥 [2025.03.21] Add DAPO - DAPO: An Open-Source LLM Reinforcement Learning System at Scale
🔥 [2025.03.18] Add Skywork-R1V - Pioneering Multimodal Reasoning with CoT
🔥 [2025.03.17] Add START: Self-taught Reasoner with Tools from Qwen Team - START
🔥 [2025.03.12] Add Multi-modal Reasoning datasets: LLaVA-R1-100k and MMMU-Reasoning-R1-Distill-Validation
🔥 [2025.03.04] Add the Visual-RFT - Visual Reinforcement Fine-Tuning
🔥 [2025.03.01] DeepSeek has released the smallpond - A lightweight data processing framework built on DuckDB and 3FS.
🔥 [2025.02.28] DeepSeek has released the 3FS - A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
🔥 [2025.02.27] DeepSeek has released the DualPipe - DualPipe achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles.
🔥 [2025.02.27] DeepSeek has released the ProfileData -The communication-computation overlap profiling strategies and low-level implementation details based on PyTorch.
🔥 [2025.02.26] DeepSeek has released the DeepGEMM - Clean and efficient FP8 GEMM kernels with fine-grained scaling
OpenAI publishes a deep-research capability.
OpenAI has launched the latest o3 model: o3-mini & o3-mini-high, which specifically support science, math and coding. These two models are available in ChatGPT App, Poe, etc.
NVIDIA-NIM has supported the DeepSeek-R1 model.
Qwen has launched a powerful multi-modal MoE model: Qwen2.5-Max, this model is available in the Bailian platform.
CodeGPT: VSCode co-pilot now supports R1.

Highlights

DeepSeek repos:

DeepSeek-R1 - DeepSeek-R1 official repository.

Qwen repos:

Qwen-QwQ - Qwen 2.5 official repository, with QwQ.

S1 from stanford - From Feifei Li team, a distillation and test-time compute impl which can match the performance of O1 and R1.

Papers

2025.04

ReSearch - ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Search-R1 - Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
R1-Searcher - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

2025.03

Visual-RFT - Visual Reinforcement Fine-Tuning
LLaVE - LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
VisualPRM - VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
START - START: Self-taught Reasoner with Tools
DAPO - DAPO: An Open-Source LLM Reinforcement Learning System at Scale
What’s Behind PPO’s Collapse in Long-CoT? Value Optimization Holds the Secret
OThink-MR1 - Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning
Embodied Reasoner - Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

2025.02

Visual Perception Token - Enhancing visual reasoning by enabling the LLM to control its perception process.
DeepSeek-V3 Tech-Report
LIMO - Less is More for Reasoning: Use 817 samples to train a model that surpasses the o1 level models.
Underthinking of Reasoning models - Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Competitive Programming with Large Reasoning Models - OpenAI: Competitive Programming with Large Reasoning Models
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks
OverThink: Slowdown Attacks on Reasoning LLMs
Think Less, Achieve More: Cut Reasoning Costs by 50% Without Sacrificing Accuracy - Sky-T1-32B-Flash, reasoning language model that significantly reduces overthinking
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention - (DeepSeek) NSA: A natively trainable Sparse Attention mechanism that integrates algorithmic innovations with hardware-aligned optimizations to achieve efficient long-context modeling.
MM-RLHF - MM-RLHF:The Next Step Forward in Multimodal LLM Alignment

2025.01

Imagine while Reasoning in Space: Multimodal Visualization-of-Thought - Multimodal Visualization-of-Thought (MVoT)
DeepSeek-R1-Tech-Report
Qwen-math-PRM-Tech-Report(MCTS/PRM)
Qwen2.5 Tech-Report
Kimi K1.5 Tech-Report
Qwen-Math-PRM - The Lessons of Developing Process Reward Models in Mathematical Reasoning
LlamaV-o1 - Rethinking Step-by-step Visual Reasoning in LLMs
rStar-Math - Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
LLMS CAN PLAN ONLY IF WE TELL THEM - A new CoT method: AoT+
SFT Memorizes, RL Generalizes - A research from DeepMind shows the effect of SFT and RL.

2024

Qwen QwQ Technical blog - QwQ: Reflect Deeply on the Boundaries of the Unknown
OpenAI-o1 Announcement - Learning to Reason with Large Language Models
DeepSeek Math Tech-Report(GRPO)
Large Language Models for Mathematical Reasoning: Progresses and Challenges (EACL 2024)
Large Language Models Cannot Self-Correct Reasoning Yet (ICLR 2024)
AT WHICH TRAINING STAGE DOES CODE DATA HELP LLM REASONING? (ICLR 2024)
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought [ code ]
MathScale - Scaling Instruction Tuning for Mathematical Reasoning
Frontier AI systems have surpassed the self-replicating red line - A paper from Fudan university indicates that LLM has surpassed the self-replicating red line.

Blogs

Models

DeepSeek series:

Model ID	ModelScope	Hugging Face
DeepSeek R1	Model Link	Model Link
DeepSeek V3	Model Link	Model Link
DeepSeek-R1-Distill-Qwen-32B	Model Link	Model Link
DeepSeek-R1-Distill-Qwen-14B	Model Link	Model Link
DeepSeek-R1-Distill-Llama-8B	Model Link	Model Link
DeepSeek-R1-Distill-Qwen-7B	Model Link	Model Link
DeepSeek-R1-Distill-Qwen-1.5B	Model Link	Model Link
DeepSeek-R1-GGUF	Model Link	Model Link
DeepSeek-R1-Distill-Qwen-32B-GGUF	Model Link	Model Link
DeepSeek-R1-Distill-Llama-8B-GGUF	Model Link	Model Link

Qwen series:

Model ID	ModelScope	Hugging Face
QwQ-32B-Preview	Model Link	Model Link
QVQ-72B-Preview	Model Link	Model Link
QwQ-32B-Preview-GGUF	Model Link	Model Link
QVQ-72B-Preview-bnb-4bit	Model Link	Model Link

Others:

Model ID	ModelScope	Hugging Face
Qwen2-VL-2B-GRPO-8k	-	Model Link

Infra

Flash MLA [DeepSeek]: https://github.com/deepseek-ai/FlashMLA
- FlashMLA is an efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences serving.
Open R1 by Hugging Face: https://github.com/huggingface/open-r1
- This repo is the official repo of Hugging Face to reproduce the training infra of DeepSeek-R1
TinyZero: https://github.com/Jiayi-Pan/TinyZero
- Clean, minimal, accessible reproduction of DeepSeek R1-Zero
SimpleRL-Reason: https://github.com/hkust-nlp/simpleRL-reason
- Use OpenRLHF to reproduce DeepSeek-R1
Ragen: https://github.com/ZihanWang314/RAGEN
- A General-Purpose Reasoning Agent Training Framework and reproduce DeepSeek-R1
TRL: https://github.com/huggingface/trl
- Hugging Face official training framework which supports open-source GRPO and other RL algorithms.
OpenRLHF: https://github.com/OpenRLHF/OpenRLHF
- An RL repo which supports RLs(supports REINFORCE++)
Logic-RL: https://github.com/Unakar/Logic-RL
Align-Anything: https://github.com/PKU-Alignment/align-anything
- Training All-modality Model with Feedback
R-Chain: A lightweight toolkit for distilling reasoning models
- https://github.com/modelscope/r-chain
Math Verify: A robust mathematical expression evaluation system designed for assessing Large Language Model outputs in mathematical tasks.
- https://github.com/huggingface/Math-Verify
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
- https://github.com/hiyouga/EasyR1
DeepGEMM - [DeepSeek] Clean and efficient FP8 GEMM kernels with fine-grained scaling
DualPipe - [DeepSeek] DualPipe achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles.
ProfileData - [DeepSeek] The communication-computation overlap profiling strategies and low-level implementation details based on PyTorch.
3FS - [DeepSeek] A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
smallpond - [DeepSeek] A lightweight data processing framework built on DuckDB and 3FS.

Datasets

OpenR1-Math-220k ModelScope | HuggingFace
OpenR1-Math-Raw ModelScope | HuggingFace
MathR - A dataset distilled from DeepSeek-R1 for NuminaMath hard-level problems.
Dolphin-R1 (HuggingFace | ModelScope) - 800k samples dataset to train DeepSeek-R1 Distill models.
R1-Distill-SFT (HuggingFace | ModelScope)
NuminaMath-TIR - Tool-integrated reasoning (TIR) plays a crucial role in this competition.
NuminaMath-CoT - Approximately 860k math problems, where each solution is formatted in a Chain of Thought (CoT) manner.
BAAI-TACO - TACO is a benchmark for code generation with 26443 problems.
OpenThoughts-114k - Open synthetic reasoning dataset with 114k high-quality examples covering math, science, code, and puzzles!
Bespoke-Stratos-17k - A reasoning dataset of questions, reasoning traces, and answers.
Clevr_CoGenT_TrainA_R1 - A multi-modal dataset for training MM R1 model.
clevr_cogen_a_train - A R1-distilled visual reasoning dataset.
S1k - A dataset for training S1 model.
中文基于满血DeepSeek-R1蒸馏数据集-110k ModelScope | HuggingFace
LLaVA多模态Reasoning数据集LLaVA-R1-100k ModelScope
MMMU-满血版R1蒸馏多模态Reasoning验证集 ModelScope

Evaluation

Best practice for evaluating R1/o1-like reasoning models
MATH-500 - A subset of 500 problems from the MATH benchmark that OpenAI created in their Let's Verify Step by Step paper
AIME-2024 - This dataset contains problems from the American Invitational Mathematics Examination (AIME) 2024.
AIME-2025: ModelScope | HuggingFace - American Invitational Mathematics Examination (AIME) 2025-I February 6th, 2025.
AIME-VALIDATION - All 90 problems come from AIME 22, AIME 23, and AIME 24
MATH-LEVEL-4 - A subset of level 4 problems from the MATH benchmark.
MATH-LEVEL-5 - A subset of level 5 problems from the MATH benchmark.
aimo-validation-amc - All 83 samples come from AMC12 2022, AMC12 2023
GPQA-Diamond - Diamond subset from GPQA benchmark.
Codeforces-Python-Submissions - A dataset of Python submissions from Codeforces.

RelatedRepos

Replicates of DeepSeek-R1 and DeepSeek-R1-Zero

HuggingFace Open R1
Simple Reinforcement Learning for Reasoning
oatllm
TinyZero
32B-DeepSeek-R1-Zero
X-R1
Open-Reasoner-Zero
Logic-RL - Reproduce R1 Zero on Logic Puzzle

Advanced Reasoning for Coding

SWE-RL - Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Advanced Reasoning for Multi-Modal

R1-V - Multi-modal R1
Open-R1-Multimodal - A multimodal reasoning model based on OpenR1
R1-Multimodal-Journey - A journey to replicate multimodal reasoning model based on Open-R1-Multimodal
VLM-R1 | DEMO - A stable and generalizable R1-style Large Vision-Language Model
Video-R1 - Towards Super Reasoning Ability in Video Understanding MLLMs
VL-Thinking - An R1-Derived Visual Instruction Tuning Dataset for Thinkable LVLMs
Open-R1-Multimodal - A fork to add multimodal model training to open-r1
Visual-RFT - Visual Reinforcement Fine-Tuning
Skywork-R1V
R1-Omni - Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning
R1-OneVision - A visual language model capable of deep CoT reasoning

Advanced Reasoning for Agent

Search-R1 - An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
ReSearch - ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
R1-Searcher - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
UI-TARS - Pioneering Automated GUI Interaction with Native Agents

awesome-deep-reasoning
awesome-deep-reasoning copied to clipboard

Metadata

Awesome-deep-reasoning

Table of Contents

News

Highlights

DeepSeek repos:

Qwen repos:

Papers

2025.04

2025.03

2025.02

2025.01

2024

Blogs

Models

Infra

Datasets

Evaluation

RelatedRepos

Replicates of DeepSeek-R1 and DeepSeek-R1-Zero

Advanced Reasoning for Coding

Advanced Reasoning for Multi-Modal

Advanced Reasoning for Agent

Star History

← Metadata

Owner

Metadata

awesome-deep-reasoning awesome-deep-reasoning copied to clipboard

Metadata

Awesome-deep-reasoning

Table of Contents

News

Highlights

DeepSeek repos:

Qwen repos:

Papers

2025.04

2025.03

2025.02

2025.01

2024

Blogs

Models

Infra

Datasets

Evaluation

RelatedRepos

Replicates of DeepSeek-R1 and DeepSeek-R1-Zero

Advanced Reasoning for Coding

Advanced Reasoning for Multi-Modal

Advanced Reasoning for Agent

Star History

← Metadata

Owner

Metadata

awesome-deep-reasoning
awesome-deep-reasoning copied to clipboard