RepL4RL
RepL4RL copied to clipboard
Representation Learning for RL
Representation Learning for Reinforcement Learning
A curated list of papers that apply representation learning (RepL) in reinforcement learning (RL).
Why RepL for RL?
A major reason to apply RepL in RL is to solve problems with high-dimensional state-action spaces. Another motivation of applying RepL in RL is to improve the sample efficiency problem. Specifically, we usually want to incorporate some inductive biases, i.e., structural information, about the tasks/envs into the representations towards better performance.
- Prevalent RL methods requires lots of supervisions.
- Instead of only learning from reward signals, we can also learn from the collected data.
- Previous methods are sample inefficient in vision-based RL.
- Good representations can accelerate learning from images.
- Most of current RL agents are task-specific.
- Good representations can generalize well across different tasks, or adapt quickly to new tasks.
- Effective exploration is challenging in many RL tasks.
- Good representations can accelerate exploration.
Challenges
- Sequential data
- Interactive learning tasks
Methods
Some popular methods of applying RepL in RL.
- Auxiliary tasks, i.e., reconstruction, MI maximization, entropy maximization, dynamics prediction.
- ACL, APS, AVFs, CIC, CPC, DBC, Dreamer, DreamerV2, DyNE, IDDAC, PBL, PI-SAC, PlaNet, RCRL, SLAC, SAC-AE, SPR, ST-DIM, TIA, UNREAL, Value-Improvement Path, World Model.
- Contrastive learning.
- ACL, ATC, Contrastive Fourier, CURL, RCRL, CoBERL.
- Data augmentation.
- DrQ, DrQ-v2, PSEs, RAD.
- Bisimulation.
- DBC, PSEs.
- Causal inference.
- MISA.
Workshops
- Self-supervision for Reinforcement Learning @ ICLR 21
- Unsupervised Reinforcement Learning @ ICML 2021
Related Work
- Self-Supervised Learning
- Invariant Representation Learning
Papers
Vision-based Control
- [arXiv' 18][CPC] Representation Learning with Contrastive Predictive Coding
- [NeurIPS' 19][AVFs] A Geometric Perspective on Optimal Representations for Reinforcement Learning
- [NeurIPS' 19] Discovery of Useful Questions as Auxiliary Tasks
- [NeurIPS' 19][ST-DIM] Unsupervised state representation learning in atari (Code)
- [NeurIPS' 20][PI-SAC] Predictive Information Accelerates Learning in RL (Code)
- [NeurIPS' 20][SLAC] Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Mode (Code)
- [NeurIPS' 20][RAD] Reinforcement Learning with Augmented Data (Code)
- [ICML' 20][CURL] Contrastive Unsupervised Representations for Reinforcement Learning (Code)
- [ICLR' 20][DynE] Dynamics-aware Embeddings (Code)
- [NeurIPS' 21] An Empirical Investigation of Representation Learning for Imitation (Code)
- [NeurIPS' 21][SGI] Pretraining Representations for Data-Efficient Reinforcement Learning (Code)
- [AAAI' 21][SAC-AE] Improving Sample Efficiency in Model-Free Reinforcement Learning from Images (Code)
- [AAAI' 21][Value-Improvement Path] Towards Better Representations for Reinforcement Learning
- [AISTATS' 21] On The Effect of Auxiliary Tasks on Representation Dynamics
- [ICLR' 21][SPR] Data-Efficient RL with Self-Predictive Representations (Code)
- [ICLR' 21][DBC] Learning invariant representations for reinforcement learning without reconstruction (Code)
- [ICLR' 21][DrQ] Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels (Code)
- [ICLR' 21][RCRL] Return-based Contrastive Representation Learning for RL
- [ICML' 21][ATC] Decoupling representation learning from reinforcement learning (Code)
- [ICML' 21][APS] Active Pretraining with Successor Features
- [ICML'21][IDDAC] Decoupling Value and Policy for Generalization in Reinforcement Learning (Code)
- [ICLR' 22][DrQ-v2] Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning (Code)
- [ICLR' 22][CoBERL][Contrastive BERT for Reinforcement Learning]
- [arXiv' 22][R3M] A Universal Visual Representation for Robot Manipulation (Code)
Theory
- [ICML' 19] DeepMDP: Learning Continuous Latent Space Models for Representation Learning
- [ICML' 20] Learning with Good Feature Representations in Bandits and in RL with a Generative Model
- [ICML' 20] Representations for Stable Off-Policy Reinforcement Learning :heart:
- [ICLR' 20] Is a good representation sufficient for sample efficient reinforcement learning?
- [ICLR' 21] Impact of Representation Learning in Linear Bandits
- [arXiv' 21] Model-free Representation Learning and Exploration in Low-rank MDPs
- [arXiv' 21] Representation Learning for Online and Offline RL in Low-rank MDPs :heart:
- [arXiv' 21] Action-Sufficient State Representation Learning for Control with Structural Constraints
- [arXiv' 21] Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions
Low-rank MDPs
- [Model-free Representation Learning and Exploration in Low-rank MDPs]
- [FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs]
- [Representation Learning for Online and Offline RL in Low-rank MDPs]
- [Provably Efficient Representation Learning in Low-rank Markov Decision Processes]
Offline RL
- [NeurIPS' 21][Contrastive Fourier] Provable Representation Learning for Imitation with Contrastive Fourier Features (Code)
- [NeurIPS' 21][DR3] DR3: Value-Based Deep RL Requires Explicit Regularization :heart:
- [ICLR' 21] Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning :heart:
- [ICML' 21][ACL] Representation Matters: Offline Pretraining for Sequential Decision Making (Code)
- [ICML' 21] Instabilities of offline rl with pre-trained neural representation
Model-based RL
- [NeurIPS' 18][World Model] Recurrent World Models Facilitate Policy Evolution
- [ICML' 19][PlaNet] Learning Latent Dynamics for Planning from Pixels (Code)
- [ICLR' 20][Dreamer] Dream to Control: Learning Behaviors by Latent Imagination (Code)
- [ICLR' 21][DreamerV2] Mastering Atari with Discrete World Models (Code)
- [ICML' 21][TIA] Learning Task Informed Abstractions (Code)
Multi-task RL
- [ICLR' 17][UNREAL] Reinforcement Learning with Unsupervised Auxiliary Tasks
- [ICML' 20][PBL] Bootstrap latent-predictive representations for multitask reinforcement learning
Exploration
- [NeurIPS' 20] Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning (Code)
- [ICML' 21][RL-Proto] Reinforcement Learning with Prototypical Representations (Code)
- [ICML WS' 21][FittedKDE] Density-Based Bonuses on Learned Representations for Reward-Free Exploration in Deep Reinforcement Learning
Generalization
-
[ICML' 20][MISA] Invariant Causal Prediction for Block MDPs (Code)
-
[ICML' 21][IDAAC] Decoupling Value and Policy for Generalization in Reinforcement Learning
-
[arXiv' 22][CIC] Contrastive Intrinsic Control for Unsupervised Skill Discovery (Code)
-
[AISTATS' 22] On the Generalization of Representations in Reinforcement Learning :heart: