awesome-model-based-RL
awesome-model-based-RL copied to clipboard
A curated list of awesome model based RL resources (continually updated)
Awesome Model-Based Reinforcement Learning
This is a collection of research papers for model-based reinforcement learning (mbrl). And the repository will be continuously updated to track the frontier of model-based rl.
Welcome to follow and star!
[2022.07.06] New: We update the ICML 2022 paper list of model-based rl! [2022.02.13] We update the ICLR 2022 paper list of model-based rl.
Table of Contents
- A Taxonomy of Model-Based RL Algorithms
-
Papers
- Classic Model-Based RL Papers
- ICML 2022 (New!!!)
- ICLR 2022
- NeurIPS 2021
- ICLR 2021
- ICML 2021
- Contributing
A Taxonomy of Model-Based RL Algorithms
We’ll start this section with a disclaimer: it’s really quite hard to draw an accurate, all-encompassing taxonomy of algorithms in the Model-Based RL space, because the modularity of algorithms is not well-represented by a tree structure. So we will publish a series of related blogs to explain more Model-Based RL algorithms.
We simply divide Model-Based RL
into two categories: Learn the Model
and Given the Model
.
-
Learn the Model
mainly focuses on how to build the environment model. -
Given the Model
cares about how to utilize the learned model.
And we give some examples as shown in the figure above. There are links to algorithms in taxonomy.
[1] World Models: Ha and Schmidhuber, 2018
[2] I2A (Imagination-Augmented Agents): Weber et al, 2017
[3] MBMF (Model-Based RL with Model-Free Fine-Tuning): Nagabandi et al, 2017
[4] MBVE (Model-Based Value Expansion): Feinberg et al, 2018
[5] ExIt (Expert Iteration): Anthony et al, 2017
[6] AlphaZero: Silver et al, 2017
[7] POPLIN (Model-Based Policy Planning): Wang et al, 2019
[8] M2AC (Masked Model-based Actor-Critic): Pan et al, 2020
Papers
format:
- [title](paper link) [links]
- author1, author2, and author3.
- openreview [if the score is public]
- key
- experiment environment
Classic Model-Based RL Papers
-
Dyna, an integrated architecture for learning, planning, and reacting
- Richard S. Sutton. ACM 1991
- Key: dyna architecture
- ExpEnv: None
-
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
- Marc Peter Deisenroth, Carl Edward Rasmussen. ICML 2011
- Key: probabilistic dynamics model
- ExpEnv: cart-pole system, robotic unicycle
-
Learning Complex Neural Network Policies with Trajectory Optimization
- Sergey Levine, Vladlen Koltun. ICML 2014
- Key: guided policy search
- ExpEnv: mujoco
-
Learning Continuous Control Policies by Stochastic Value Gradients
- Nicolas Heess, Greg Wayne, David Silver, Timothy Lillicrap, Yuval Tassa, Tom Erez. NIPS 2015
- Key: backpropagation through paths + gradient on real trajectory
- ExpEnv: mujoco
-
- Junhyuk Oh, Satinder Singh, Honglak Lee. NIPS 2017
- Key: value-prediction model
- ExpEnv: collect domain, atari
-
Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
- Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee. NIPS 2018
- Key: ensemble model and Qnet + value expansion
- ExpEnv: mujoco, roboschool
-
Recurrent World Models Facilitate Policy Evolution
- David Ha, Jürgen Schmidhuber. NIPS 2018
- Key: vae(representation) + rnn(predictive model)
- ExpEnv: car racing, vizdoom
-
Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
-
When to Trust Your Model: Model-Based Policy Optimization
- Michael Janner, Justin Fu, Marvin Zhang, Sergey Levine. NeurIPS 2019
- Key: ensemble model + sac + k-branched rollout
- ExpEnv: mujoco
-
Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees
- Yuping Luo, Huazhe Xu, Yuanzhi Li, Yuandong Tian, Trevor Darrell, Tengyu Ma. ICLR 2019
- Key: Discrepancy Bounds Design + ME-TRPO with multi-step + Entropy regularization
- ExpEnv: mujoco
-
Model-Ensemble Trust-Region Policy Optimization
- Thanard Kurutach, Ignasi Clavera, Yan Duan, Aviv Tamar, Pieter Abbeel. ICLR 2018
- Key: ensemble model + TRPO
- ExpEnv: mujoco
-
Dream to Control: Learning Behaviors by Latent Imagination
- Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi. ICLR 2019
- Key: latent space imagination
- ExpEnv: deepmind control suite, atari, deepmind lab
-
Exploring Model-based Planning with Policy Networks
- Tingwu Wang, Jimmy Ba. ICLR 2020
- Key: model-based policy planning in action space and parameter space
- ExpEnv: mujoco
-
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver. Nature 2020
- Key: MCTS + value equivalence
- ExpEnv: chess, shogi, go, atari
ICML 2022
-
DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations
- Fei Deng, Ingook Jang, Sungjin Ahn
- Key: dreamer + prototypes
- ExpEnv: deepmind control suite
-
Denoised MDPs: Learning World Models Better Than the World Itself
- Tongzhou Wang, Simon Du, Antonio Torralba, Phillip Isola, Amy Zhang, Yuandong Tian
- Key: representation learning + denoised model
- ExpEnv: deepmind control suite, RoboDesk
-
- Qi Wang, Herke van Hoof
- Key: graph structured surrogate model + meta training
- ExpEnv: atari, mujoco
-
Towards Adaptive Model-Based Reinforcement Learning
- Yi Wan, Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad, Sarath Chandar, Harm van Seijen
- Key: local change adaptation
- ExpEnv: GridWorldLoCA, ReacherLoCA, MountaincarLoCA
-
Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation
- Pier Giuseppe Sessa, Maryam Kamgarpour, Andreas Krause
- Key: model-based multi-agent + confidence bound
- ExpEnv: SMART
-
- Shentao Yang, Yihao Feng, Shujian Zhang, Mingyuan Zhou
- Key: offline rl + model-based rl + stationary distribution regularization
- ExpEnv: d4rl
-
Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization
- Brandon Trabucco, Xinyang Geng, Aviral Kumar, Sergey Levine
- Key: benchmark + offline MBO
- ExpEnv: Design-Bench Benchmark Tasks
-
Temporal Difference Learning for Model Predictive Control
- Nicklas Hansen, Hao Su, Xiaolong Wang
- Key: td-learning + MPC
- ExpEnv: deepmind control suite, Meta-World
ICLR 2022
-
Revisiting Design Choices in Offline Model Based Reinforcement Learning
- Cong Lu, Philip Ball, Jack Parker-Holder, Michael Osborne, Stephen J. Roberts
- Key: model-based offline + uncertainty quantification
- OpenReview: 8, 8, 6, 6, 6
- ExpEnv: d4rl dataset
-
Value Gradient weighted Model-Based Reinforcement Learning
- Claas A Voelcker, Victor Liao, Animesh Garg, Amir-massoud Farahmand
- Key: Value-Gradient weighted Model loss
- OpenReview: 8, 8, 6, 6
- ExpEnv: mujoco
-
Planning in Stochastic Environments with a Learned Model
- Ioannis Antonoglou, Julian Schrittwieser, Sherjil Ozair, Thomas K Hubert, David Silver
- Key: MCTS + stochastic MuZero
- OpenReview: 10, 8, 8, 5
- ExpEnv: 2048 game, Backgammon, Go
-
Policy improvement by planning with Gumbel
- Ivo Danihelka, Arthur Guez, Julian Schrittwieser, David Silver
- Key: Gumbel AlphaZero + Gumbel MuZero
- OpenReview: 8, 8, 8, 6
- ExpEnv: go, chess, atari
-
Model-Based Offline Meta-Reinforcement Learning with Regularization
- Sen Lin, Jialin Wan, Tengyu Xu, Yingbin Liang, Junshan Zhang
- Key: model-based offline Meta-RL
- OpenReview: 8, 6, 6, 6
- ExpEnv: d4rl dataset
-
Information Prioritization through Empowerment in Visual Model-based RL
- Homanga Bharadhwaj, Mohammad Babaeizadeh, Dumitru Erhan, Sergey Levine
- Key: mutual information + visual model-based RL
- OpenReview: 8, 8, 8, 6
- ExpEnv: deepmind control suite, Kinetics dataset
-
Transfer RL across Observation Feature Spaces via Model-Based Regularization
- Yanchao Sun, Ruijie Zheng, Xiyao Wang, Andrew E Cohen, Furong Huang
- Key: latent dynamics model + transfer RL
- OpenReview: 8, 6, 5, 5
- ExpEnv: CartPole, Acrobot and Cheetah-Run, mujoco, 3DBall
-
Learning State Representations via Retracing in Reinforcement Learning
- Changmin Yu, Dong Li, Jianye HAO, Jun Wang, Neil Burgess
- Key: representation learning + learning via retracing
- OpenReview: 8, 6, 5, 3
- ExpEnv: deepmind control suite
-
Model-augmented Prioritized Experience Replay
- Youngmin Oh, Jinwoo Shin, Eunho Yang, Sung Ju Hwang
- Key: prioritized experience replay + mbrl
- OpenReview: 8, 8, 6, 5
- ExpEnv: pybullet
-
Evaluating Model-Based Planning and Planner Amortization for Continuous Control
- Arunkumar Byravan, Leonard Hasenclever, Piotr Trochim, Mehdi Mirza, Alessandro Davide Ialongo, Yuval Tassa, Jost Tobias Springenberg, Abbas Abdolmaleki, Nicolas Heess, Josh Merel, Martin Riedmiller
- Key: model predictive control
- OpenReview: 8, 6, 6, 6
- ExpEnv: mujoco
-
Gradient Information Matters in Policy Optimization by Back-propagating through Model
- Chongchong Li, Yue Wang, Wei Chen, Yuting Liu, Zhi-Ming Ma, Tie-Yan Liu
- Key: two-model-based method + analyze model error and policy gradient
- OpenReview: 8, 8, 6, 6
- ExpEnv: mujoco
-
Pareto Policy Pool for Model-based Offline Reinforcement Learning
- Yijun Yang, Jing Jiang, Tianyi Zhou, Jie Ma, Yuhui Shi
- Key: model-based offline + model return-uncertainty trade-off
- OpenReview: 8, 8, 6, 5
- ExpEnv: d4rl dataset
-
Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage
- Masatoshi Uehara, Wen Sun
- Key: model-based offline theory + PAC bounds
- OpenReview: 8, 6, 6, 5
- ExpEnv: None
NeurIPS 2021
-
On Effective Scheduling of Model-based Reinforcement Learning
-
Safe Reinforcement Learning by Imagining the Near Future
- Garrett Thomas, Yuping Luo, Tengyu Ma
- Key: safe rl + reward penalty + theory about model-based rollouts
- OpenReview: 8, 6, 6
- ExpEnv: mujoco
-
Model-Based Reinforcement Learning via Imagination with Derived Memory
- Yao Mu, Yuzheng Zhuang, Bin Wang, Guangxiang Zhu, Wulong Liu, Jianyu Chen, Ping Luo, Shengbo Eben Li, Chongjie Zhang, Jianye HAO
- Key: extension of dreamer + prediction-reliability weight
- OpenReview: 6, 6, 6, 6
- ExpEnv: deepmind control suite
-
MobILE: Model-Based Imitation Learning From Observation Alone
-
Model-Based Episodic Memory Induces Dynamic Hybrid Controls
- Hung Le, Thommen Karimpanal George, Majid Abdolshah, Truyen Tran, Svetha Venkatesh
- Key: model-based + episodic control
- OpenReview: 7, 7, 6, 6
- ExpEnv: 2D maze navigation, cartpole, mountainCar and lunarlander, atari, 3D navigation: gym-miniworld
-
A Consciousness-Inspired Planning Agent for Model-Based Reinforcement Learning
- Mingde Zhao, Zhen Liu, Sitao Luan, Shuyuan Zhang, Doina Precup, Yoshua Bengio
- Key: mbrl + set representation
- OpenReview: 7, 7, 7, 6
- ExpEnv: MiniGrid-BabyAI framework
-
Mastering Atari Games with Limited Data
- Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao
- Key: muzero + self-supervised consistency loss
- OpenReview: 7, 7, 7, 5
- ExpEnv: atrai 100k, deepmind control suite
-
Online and Offline Reinforcement Learning by Planning with a Learned Model
- Julian Schrittwieser, Thomas K Hubert, Amol Mandhane, Mohammadamin Barekatain, Ioannis Antonoglou, David Silver
- Key: muzero + reanalyse + offline
- OpenReview: 8, 8, 7, 6
- ExpEnv: atrai dataset, deepmind control suite dataset
-
Self-Consistent Models and Values
- Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado van Hasselt, David Silver
- Key: new model learning way
- OpenReview: 7, 7, 7, 6
- ExpEnv: tabular MDP, Sokoban, atari
-
- Christopher Grimm, Andre Barreto, Gregory Farquhar, David Silver, Satinder Singh
- Key: value equivalence + value-based planning + muzero
- OpenReview: 8, 7, 7, 6
- ExpEnv: four rooms, atari
-
MOPO: Model-based Offline Policy Optimization
- Tianhe Yu, Garrett Thomas, Lantao Yu, Stefano Ermon, James Zou, Sergey Levine, Chelsea Finn, Tengyu Ma
- Key: model-based + offline
- OpenReview: None
- ExpEnv: d4rl dataset, halfcheetah-jump and ant-angle
-
RoMA: Robust Model Adaptation for Offline Model-based Optimization
- Sihyun Yu, Sungsoo Ahn, Le Song, Jinwoo Shin
- Key: model-based + offline
- OpenReview: 7, 6, 6
- ExpEnv: design-bench
-
Offline Reinforcement Learning with Reverse Model-based Imagination
- Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li, Chongjie Zhang
- Key: model-based + offline
- OpenReview: 7, 6, 6, 5
- ExpEnv: d4rl dataset
-
Offline Model-based Adaptable Policy Learning
- Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Tony Qin, Shang Wenjie, Jieping Ye
- Key: model-based + offline
- OpenReview: 6, 6, 6, 4
- ExpEnv: d4rl dataset
-
Weighted model estimation for offline model-based reinforcement learning
- Toru Hishinuma, Kei Senda
- Key: model-based + offline
- OpenReview: 7, 6, 6, 6
- ExpEnv: pendulum, d4rl dataset
-
Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation
- Weitong Zhang, Dongruo Zhou, Quanquan Gu
- Key: learning theory + model-based reward-free RL + linear function approximation
- OpenReview: 6, 6, 5, 5
- ExpEnv: None
-
- Kefan Dong, Jiaqi Yang, Tengyu Ma
- Key: learning theory + model-based bandit RL + nonlinear function approximation
- OpenReview: 7, 7, 7, 6
- ExpEnv: None
ICLR 2021
-
Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
- Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu
- Key: model-based + behavior cloning (warmup) + trpo
- OpenReview: 8, 7, 7, 5
- ExpEnv: d4rl dataset
-
Control-Aware Representations for Model-based Reinforcement Learning
- Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh
- Key: representation learning + model-based soft actor-critic
- OpenReview: 6, 6, 6
- ExpEnv: planar system, inverted pendulum – swingup, cartpole, 3-link manipulator — swingUp & balance
-
Mastering Atari with Discrete World Models
- Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba
- Key: Dreamer V1 + many tricks(multiple categorical variables, KL balancing, etc)
- OpenReview: 9, 8, 5, 4
- ExpEnv: atari
-
Model-Based Visual Planning with Self-Supervised Functional Distances
- Stephen Tian, Suraj Nair, Frederik Ebert, Sudeep Dasari, Benjamin Eysenbach, Chelsea Finn, Sergey Levine
- Key: goal-reaching task + dynamics learning + distance learning (goal-conditioned Q-function)
- OpenReview: 7, 7, 7, 7
- ExpEnv: sawyer, door sliding
-
- Arthur Argenson, Gabriel Dulac-Arnold
- Key: model-based + offline
- OpenReview: 8, 7, 5, 5
- ExpEnv: RL Unplugged(RLU), d4rl dataset
-
Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation
- Justin Fu, Sergey Levine
- Key: model-based + offline
- OpenReview: 8, 6, 6
- ExpEnv: design-bench
-
On the role of planning in model-based deep reinforcement learning
- Jessica B. Hamrick, Abram L. Friesen, Feryal Behbahani, Arthur Guez, Fabio Viola, Sims Witherspoon, Thomas Anthony, Lars Buesing, Petar Veličković, Théophane Weber
- Key: discussion about planning in MuZero
- OpenReview: 7, 7, 6, 5
- ExpEnv: atari, go, deepmind control suite
-
Representation Balancing Offline Model-based Reinforcement Learning
- Byung-Jun Lee, Jongmin Lee, Kee-Eung Kim
- Key: Representation Balancing MDP + model-based + offline
- OpenReview: 7, 7, 7, 6
- ExpEnv: d4rl dataset
-
- Balázs Kégl, Gabriel Hurtado, Albert Thomas
- Key: mixture density nets + heteroscedasticity
- OpenReview: 7, 7, 7, 6, 5
- ExpEnv: acrobot system
ICML 2021
-
Conservative Objective Models for Effective Offline Model-Based Optimization
- Brandon Trabucco, Aviral Kumar, Xinyang Geng, Sergey Levine
- Key: conservative objective model + offline mbrl
- ExpEnv: design-bench
-
Continuous-Time Model-Based Reinforcement Learning
- Çağatay Yıldız, Markus Heinonen, Harri Lähdesmäki
- Key: continuous-time
- ExpEnv: pendulum, cartPole and acrobot
-
Model-Based Reinforcement Learning via Latent-Space Collocation
- Oleh Rybkin, Chuning Zhu, Anusha Nagabandi, Kostas Daniilidis, Igor Mordatch, Sergey Levine
- Key: latent space collocation
- ExpEnv: sparse metaworld tasks
-
Model-Free and Model-Based Policy Evaluation when Causality is Uncertain
- David A Bruns-Smith
- Key: worst-case bounds
- ExpEnv: ope-tools
-
Muesli: Combining Improvements in Policy Optimization
- Matteo Hessel, Ivo Danihelka, Fabio Viola, Arthur Guez, Simon Schmitt, Laurent Sifre, Theophane Weber, David Silver, Hado van Hasselt
- Key: value equivalence
- ExpEnv: atari
-
Vector Quantized Models for Planning
- Sherjil Ozair, Yazhe Li, Ali Razavi, Ioannis Antonoglou, Aäron van den Oord, Oriol Vinyals
- Key: VQVAE + MCTS
- ExpEnv: chess datasets, DeepMind Lab
-
PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration
- Yuda Song, Wen Sun
- Key: sample complexity + kernelized nonlinear regulators + linear MDPs
- ExpEnv: mountain car, antmaze, mujoco
-
Temporal Predictive Coding For Model-Based Planning In Latent Space
- Tung Nguyen, Rui Shu, Tuan Pham, Hung Bui, Stefano Ermon
- Key: temporal predictive coding with a RSSM + latent space
- ExpEnv: deepmind control suite
-
Model-based Reinforcement Learning for Continuous Control with Posterior Sampling
- Ying Fan, Yifei Ming
- Key: regret bound of psrl + mpc
- ExpEnv: continuous cartpole, pendulum swingup,, mujoco
-
A Sharp Analysis of Model-based Reinforcement Learning with Self-Play
- Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin
- Key: learning theory + multi-agent + model-based self play + two-player zero-sum Markov games
- ExpEnv: None
Contributing
Our purpose is to make this repo even better. If you are interested in contributing, please refer to HERE for instructions in contribution.
License
Awesome Model-Based RL is released under the Apache 2.0 license.