awesome-deep-rl icon indicating copy to clipboard operation
awesome-deep-rl copied to clipboard

For deep RL and the future of AI.

Awesome Deep Reinforcement Learning

July 2022 update: EDDICT added

Mar 2022 update: a few papers released in early 2022

Dec 2021 update: Unsupervised RL

Introduction to awesome drl

Reinforcement learning is the fundamental framework for building AGI. Therefore we share important contributions within this awesome drl project.

Landscape of Deep RL

updated Landscape of DRL

Content

  • Awesome Deep Reinforcement Learning
    • Introduction to awesome drl
    • Landscape of Deep RL
    • Content
    • General guidances
    • 2022
    • Foundations and theory
    • General benchmark frameworks
    • Unsupervised
    • Offline
    • Value based
    • Policy gradient
    • Explorations
    • Actor-Critic
    • Model-based
    • Model-free + Model-based
    • Hierarchical
    • Option
    • Connection with other methods
    • Connecting value and policy methods
    • Reward design
    • Unifying
    • Faster DRL
    • Multi-agent
    • New design
    • Multitask
    • Observational Learning
    • Meta Learning
    • Distributional
    • Planning
    • Safety
    • Inverse RL
    • No reward RL
    • Time
    • Adversarial learning
    • Use Natural Language
    • Generative and contrastive representation learning
    • Belief
    • PAC
    • Applications

Illustrations:

Recommendations and suggestions are welcome.

General guidances

2022

  • Reinforcement Learning with Action-Free Pre-Training from Videos arxiv repo

Foundations and theory

  • General non-linear Bellman equations 9 July 2019 arxiv
  • Monte Carlo Gradient Estimation in Machine Learning 25 Jun 2019 arxiv

General benchmark frameworks

Unsupervised

Offline

Value based

Policy gradient

  • Phasic Policy Gradient 9 Sep 2020 arxiv code
  • An operator view of policy gradient methods 22 Jun 2020 arxiv
  • Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces 14 Jun 2019 arxiv
  • Policy Gradient Search: Online Planning and Expert Iteration without Search Trees 7 Apr 2019 arxiv
  • SUPERVISED POLICY UPDATE FOR DEEP REINFORCEMENT LEARNING 24 Dec 2018 arxiv
  • PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation 5 Oct 2018 arxiv
  • Clipped Action Policy Gradient 22 June 2018
  • Expected Policy Gradients for Reinforcement Learning 10 Jan 2018
  • Proximal Policy Optimization Algorithms 20 July 2017
  • Emergence of Locomotion Behaviours in Rich Environments 7 July 2017
  • Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning 1 Jun 2017
  • Equivalence Between Policy Gradients and Soft Q-Learning
  • Trust Region Policy Optimization
  • Reinforcement Learning with Deep Energy-Based Policies
  • Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY CRITIC

Explorations

  • Entropic Desired Dynamics for Intrinsic Control 2021 openreview
  • Self-Supervised Exploration via Disagreement 10 Jun 2019 arxiv
  • Approximate Exploration through State Abstraction 24 Jan 2019
  • The Uncertainty Bellman Equation and Exploration 15 Sep 2017
  • Noisy Networks for Exploration 30 Jun 2017 implementation
  • Count-Based Exploration in Feature Space for Reinforcement Learning 25 Jun 2017
  • Count-Based Exploration with Neural Density Models 14 Jun 2017
  • UCB and InfoGain Exploration via Q-Ensembles 11 Jun 2017
  • Minimax Regret Bounds for Reinforcement Learning 16 Mar 2017
  • Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
  • EX2: Exploration with Exemplar Models for Deep Reinforcement Learning

Actor-Critic

  • Generalized Off-Policy Actor-Critic 27 Mar 2019
  • Soft Actor-Critic Algorithms and Applications 29 Jan 2019
  • The Reactor: A Sample-Efficient Actor-Critic Architecture 15 Apr 2017
  • SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY
  • REINFORCEMENT LEARNING WITH UNSUPERVISED AUXILIARY TASKS
  • Continuous control with deep reinforcement learning

Model-based

  • Self-Consistent Models and Values 25 Oct 2021 arxiv
  • When to use parametric models in reinforcement learning? 12 Jun 2019 arxiv
  • Model Based Reinforcement Learning for Atari 5 Mar 2019
  • Model-Based Stabilisation of Deep Reinforcement Learning 6 Sep 2018
  • Learning model-based planning from scratch 19 July 2017

Model-free + Model-based

  • Imagination-Augmented Agents for Deep Reinforcement Learning 19 July 2017

Hierarchical

  • WHY DOES HIERARCHY (SOMETIMES) WORK SO WELL IN REINFORCEMENT LEARNING? 23 Sep 2019 arxiv
  • Language as an Abstraction for Hierarchical Deep Reinforcement Learning 18 Jun 2019 arxiv

Option

  • Variational Option Discovery Algorithms 26 July 2018
  • A Laplacian Framework for Option Discovery in Reinforcement Learning 16 Jun 2017

Connection with other methods

  • Robust Imitation of Diverse Behaviors
  • Learning human behaviors from motion capture by adversarial imitation
  • Connecting Generative Adversarial Networks and Actor-Critic Methods

Connecting value and policy methods

  • Bridging the Gap Between Value and Policy Based Reinforcement Learning
  • Policy gradient and Q-learning

Reward design

  • End-to-End Robotic Reinforcement Learning without Reward Engineering 16 Apr 2019 arxiv
  • Reinforcement Learning with Corrupted Reward Channel 23 May 2017

Unifying

  • Multi-step Reinforcement Learning: A Unifying Algorithm

Faster DRL

  • Neural Episodic Control

Multi-agent

  • No Press Diplomacy: Modeling Multi-Agent Gameplay 4 Sep 2019 arxiv
  • Options as responses: Grounding behavioural hierarchies in multi-agent RL 6 Jun 2019 arxiv
  • Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination 18 Jun 2019 arxiv
  • A Regularized Opponent Model with Maximum Entropy Objective 17 May 2019 arxiv
  • Deep Q-Learning for Nash Equilibria: Nash-DQN 23 Apr 2019 arxiv
  • Malthusian Reinforcement Learning 3 Mar 2019 arxiv
  • Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning 4 Nov 2018
  • INTRINSIC SOCIAL MOTIVATION VIA CAUSAL INFLUENCE IN MULTI-AGENT RL 19 Oct 2018
  • QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning 30 Mar 2018
  • Modeling Others using Oneself in Multi-Agent Reinforcement Learning 26 Feb 2018
  • The Mechanics of n-Player Differentiable Games 15 Feb 2018
  • Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments 10 Oct 2017
  • Learning with Opponent-Learning Awareness 13 Sep 2017
  • Counterfactual Multi-Agent Policy Gradients
  • Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments 7 Jun 2017
  • Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games 29 Mar 2017

New design

Multitask

  • Kickstarting Deep Reinforcement Learning 10 Mar 2018
  • Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning 7 Nov 2017
  • Distral: Robust Multitask Reinforcement Learning 13 July 2017

Observational Learning

  • Observational Learning by Reinforcement Learning 20 Jun 2017

Meta Learning

  • Discovery of Useful Questions as Auxiliary Tasks 10 Sep 2019 arxiv
  • Meta-learning of Sequential Strategies 8 May 2019 arxiv
  • Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables 19 Mar 2019 arxiv
  • Some Considerations on Learning to Explore via Meta-Reinforcement Learning 11 Jan 2019 arxiv
  • Meta-Gradient Reinforcement Learning 24 May 2018 arxiv
  • ProMP: Proximal Meta-Policy Search 16 Oct 2018 arxiv
  • Unsupervised Meta-Learning for Reinforcement Learning 12 Jun 2018

Distributional

  • GAN Q-learning 20 July 2018
  • Implicit Quantile Networks for Distributional Reinforcement Learning 14 Jun 2018
  • Nonlinear Distributional Gradient Temporal-Difference Learning 20 May 2018
  • DISTRIBUTED DISTRIBUTIONAL DETERMINISTIC POLICY GRADIENTS 23 Apr 2018
  • An Analysis of Categorical Distributional Reinforcement Learning 22 Feb 2018
  • Distributional Reinforcement Learning with Quantile Regression 27 Oct 2017
  • A Distributional Perspective on Reinforcement Learning 21 July 2017

Planning

  • Search on the Replay Buffer: Bridging Planning and Reinforcement Learning 12 June 2019 arxiv

Safety

  • Robust Reinforcement Learning for Continuous Control with Model Misspecification 18 Jun 2019 arxiv
  • Verifiable Reinforcement Learning via Policy Extraction 22 May 2018 arxiv

Inverse RL

  • ADDRESSING SAMPLE INEFFICIENCY AND REWARD BIAS IN INVERSE REINFORCEMENT LEARNING 9 Sep 2018

No reward RL

Time

  • Interval timing in deep reinforcement learning agents 31 May 2019 arxiv
  • Time Limits in Reinforcement Learning

Adversarial learning

  • Sample-efficient Adversarial Imitation Learning from Observation 18 Jun 2019 arxiv

Use Natural Language

  • Using Natural Language for Reward Shaping in Reinforcement Learning 31 May 2019 arxiv

Generative and contrastive representation learning

  • Unsupervised State Representation Learning in Atari 19 Jun 2019 arxiv

Belief

  • Shaping Belief States with Generative Environment Models for RL 24 Jun 2019 arxiv

PAC

  • Provably Convergent Off-Policy Actor-Critic with Function Approximation 11 Nov 2019 arxiv

Applications

  • Benchmarks for Deep Off-Policy Evaluation 30 Mar 2021 arxiv
  • Learning Reciprocity in Complex Sequential Social Dilemmas 19 Mar 2019 arxiv
  • DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills 9 Apr 2018
  • TUNING RECURRENT NEURAL NETWORKS WITH REINFORCEMENT LEARNING