RLnotes icon indicating copy to clipboard operation
RLnotes copied to clipboard

Reinforcement Learning

Algorithm

Playing Atari with Deep Reinforcement Learning(NIPS 2013, Deep Q-learning with Experience Replay )
Deterministic Policy Gradient Algorithms(ICML 2014, DPG)
Human-level control through deep reinforcement learning(Nature 2015, traget Q-network)
Deep Reinforcement Learning with Double Q-learning(AAAI 2016, Double DQN)
Prioritized Experience Replay(ICLR 2016, Prioritized replay)
Hindsight Experience Replay(arxiv 2017, HER)
Dueling Network Architectures for Deep Reinforcement(ICML 2016, Dueling DQN)
Mastering the game of Go with deep neural networks and tree search(Nature 2016, AlphaGo)
Continuous control with deep reinforcement learning(ICLR 2016, DDPG)
Continuous Deep Q-Learning with Model-based Acceleration(blog & Zhihu, DeepMind 2016, NAF)
Asynchronous Methods for Deep Reinforcement Learning (ICML 2016, A3C)
Reinforcement Learning thorugh Asynchronous Advantage Actor-Critic on a GPU(ICLR 2017, GA3C)
Generative Adversial Imitation Learning(NIPS 2016, GAIL)
Proximal Policy Optimization Algorithms(arxiv 2017, OpenAI PPO)
Emergence of Locomotion Behaviours in Rich Environments (arxiv 2017, DeepMind PPO)
Reinforcement learning with Deep Energy-Based Polices(blog, ICML 2017, Soft Q-learning)
Mastering the game of Go without human knowledge(AlphaGo zero, Nature 2017)
Soft Actor-Critic Algorithms and Applications(arxiv 2018, Soft Actor-Critic)
A Distributional Perspective on Reinforcement Learning(ICML 2017, Distributional RL, C51)
Meta-Learning Shared Hierarchies(blog, OpenAI 2017, Hierarchical RL)
Rainbow: Combining improvements in deep reinforcement learning(AAAI 2018, Rainbow)
Multi-task Deep Reinforcement Learning with PopArt(PopArt, train a single agent that can play a whole set of 57 diverse Atari video games with reward signal normalization)
Neural scene representation and rendering(blog, Science 2018, Generative Query Network (GQN))
World Models(blog, NIPS 2018, World Models=Vison model(VAE)+Memory(RNN+MDN)+Compact Controller(CMA-ES), the first known agent to solve OpenAi Gym Race Car, later better solution: PlaNet(Google 2019), Dreamer(Google 2020), ( Self-Attention Agent(Google 2020, much fewer parameters))
Reinforcement Learning for Improving Agent Design( Joint learning of policy and structure, Google 2018)
Distributed Distributional Deterministic Policy Gradients(ICLR 2018, D4PG, distributional RL+distributed sampling (APEX)+N-step returns+Prioritized Experience Replay (PER))
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play(AlphaZero, Science 2018)
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model(MuZero, DeepMind 2019)
Agent57: Outperforming the Atari Human Benchmark(blog, DeepMind 2020, Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games)

Meta-Learning

Learning to reinforcement learn(DeepMind 2017)
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks(ICML 2017, MAML)
Reptile: A Scalable Meta-Learning Algorithm(OpenAI 2018, Retile)

Curriculum learning

Curriculum learning(ACM 2009, Curriculum learning gradually increases the complexity of the learning task by choosing more and more difficult examples for the learning algorith)
Automated Curriculum Learning for Neural Networks

Curiosity & Exploration & Reward Shaping

With sparse external reward

Reinforcement Learning with Unsupervised Auxiliary Tasks(DeepMind 2016, UNREAL)
Curiosity-driven exploration by self-supervised prediction(ICML 2017, Intrinsic Curiosity Module)
Exploration by Random Network Distillation(blog, RND, exceed average human performance on Montezuma’s Revenge) Episodic curiosity through reachability(code & blog, Google Brain & DeepMind 2019, maximize curiosity only if is conducive to the ultimate goal)

Even without external reward

Apprenticeship learning via Inverse Reinforcement Learning(ICML 2004, Inverse Imitation Learning)
Deep reinforcement learning from human preferences(blog, arxiv 2017, Just need 900 bits of feedback from a human evaluator to learn to backflip — a seemingly simple task which is simple to judge but challenging to specify.)
Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play(ICLR 2018, Self-play: Alice and Bob)
Large-Scale Study of Curiosity-Driven Learning(website, OpenAI 2018, "More generally, these results suggest that, in environments designed by humans, the extrinsic reward is perhaps often aligned with the objective of seeking novelty.")
End-to-End Robotic Reinforcement Learning without Reward Engineering(code, RSS 2019, Berkeley, using successful outcome images to train a success classifier, then use log-probabilities obtained from the success classifier as reward for running reinforcement learning and actively query the human user to optimize the success classifier)

Others

Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards(ICLR 2019, just train a CNN to predict response as intrinsic reward in navigation task)

Reality Gap

Sim-to-Real: Learning Agile Locomotion For Quadruped Robots(arxiv 2018, Google, "We narrow this reality gap by improving the physics simulator and learning robust policies.")
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization(ICRA 2018, "By randomizing the dynamics of the simulator during training, we are able to develop policies that are capable of adapting to very different dynamics".)
Solving Rubik’s Cube with a Robot Hand(OpenAI 2019, "we developed a new method called Automatic Domain Randomization (ADR), which endlessly generates progressively more difficult environments in simulation. This frees us from having an accurate model of the real world, and enables the transfer of neural networks learned in simulation to be applied to the real world.")
Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience(ICRA 2019, "Rather than manually tuning the randomization of simulations, we adapt the simulation parameter distribution using a few real world roll-outs interleaved with policy training")

Multi-Agent

Human-level performance in 3D multiplayer games with population-based reinforcement learning(multiplayer FPS game, DeepMind, Science 2019)

Other issue

Discrete-Continuous Hybrid Action Spaces

Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces(IJCAI 2019)

Benchmark

gym(OpenAI, big gays)
Roboschool(OpenAI 2017)
gym Retro(OpenAI, game platform)
Retro Contest(a transfer learning contest for generalization test, contest result)
CoinRun(OpenAI 2018, provide a metric for an agent’s ability to transfer its experience to novel situations)
DeepMind Lab(DeepMind 2016, first-person 3D game platform)
Control Suite(DeepMind 2018)
Unity
pybullet Pommerman(Multi-Agent "Bomberman"-like game)
football(Google 2019)
ROBEL(Google 2019, ROBEL is an open-source platform of cost-effective robots designed for reinforcement learning in the real world)
RLBench(Robot Learning Benchmark)
highway-env(Gym-like autonomous driving env)

Implementations

OpenAI Baselines(OpenAI)
keras-rl(keras)
rllab(Berkeley)
RLlib(Berkeley, multi-agent)
Horizon(Facebook)
TensorForce(reinforce.io)
Dopamine(Google)
Coach(Intel)
rlkit(personal)
TRFL(DeepMind)
Catalyst.RL(catalyst-team)
RL framework
rlax(DeepMind, a library built on top of JAX that exposes useful building blocks for implementing reinforcement learning agents.)
Tianshou(天授)(Tsinghua)
SEED RL(Google, Scalable and Efficient Deep-RL with Accelerated Central Inference.)

Manipulation

Reinforcement and Imitation Learning for Diverse Visuomotor Skills(blog,DeepMind 2018, few demostrations+PPO+LSTM+GAIL)
Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research(blog, OpenAI 2018, DDPG + HER with sparse rewards)
Composable Deep Reinforcement Learning for Robotic Manipulation(blog, Berkeley 2018, two strenghts of Soft Q-learning: multimodal exploration; composed)
One-Shot Visual Imitation Learning via Meta-Learning(CoRL 2017, combine imitation learning with MAML)
One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning(blog, RSS 2018, One-Shot Imitation from Watching Videos without labeled expert actions)
Grasp2Vec: Learning Object Representations from Self-Supervised Grasping(CoRL 2018, Google Brain)

Character Skills

DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills(blog, ACM Transactions on Graphic 2018, Reference State Initialization (RSI)+Early Termination (ET))
SFV: Reinforcement Learning of Physical Skills from Videos(blog, ACM Transactions on Graphic 2018)

Computer Vision

Active Object Localization with Deep Reinforcement Learning(ICCV 2015)
Hierarchical Object Detection with Deep Reinforcement Learning(NIPS 2016)
Crafting a Toolchain for Image Restoration by Deep Reinforcement Learning(CVPR 2018)
Emergence of exploratory look-around behaviors through active observation completion(Science Robotics 2019)
AI Online Filters to Real World Image Recognition(arxiv 2020)

Doom

ViZDoom: A Doom-based AI Research Platform for Visual Reinforcement Learning
Playing doom with slam-augmented deep reinforcement learning(CVPR 2016)
Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning(ICLR 2017, VIZDoom2016 Track1冠军)
Learning to Act by Predicting the Future(ICLR 2017, VIZDoom2016 Track2冠军)
Playing FPS Games with Deep Reinforcement Learning(AAAI 2017, VIZDoom2017冠军)

Video

Neural Adaptive Video Streaming with Pensieve(ACM 2017)

Legged locomotion

Feedback Control For Cassie With Deep Reinforcement Learning(IROS 2018)
Learning agile and dynamic motor skills for legged robots(Science Robotics 2019, ETH. train >2000 ANYmals in real time in simulation platform together; train a NN representing the complex dynamics with data from the real robot, so the trained policy can be directly deployed on the real system without any modification)
Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for Cassie(arxiv 2019)
learning to adapt in dynamic, real-world environments through meta-reinforcement learning(ICLR 2019, Berkeley, use meta-learning to train a dynamics model prior such that, when combined with recent data, this prior can be rapidly adapted to the local context)
Learning to Walk in the Real World with Minimal Human Effort(Google 2020)

Perception

Manipulation by Feel: Touch-Based Control with Deep Predictive Models(arxiv 2019, Berkeley, Haptic sensor)
Motion Perception in Reinforcement Learning with Dynamic Objects(arxiv 2019, image + flow rather than stacked images to include motion information)
Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks(ICRA 2019)

Control

Assembly robots with optimized control stiffness through reinforcement learning(arxiv 2020, generation of nondiagonal stiffness matrices online for admittance control of contact-rich tasks using deep Q-learning)
Inverse Reinforcement Learning with Model Predictive Control(NIPS 2019, Baidu)

Transport

Flow: Architecture and Benchmarking for Reinforcement Learning in Traffic Control(blog, CoRL 2018, UC Berkerly)
IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control(ACM SIGKDD 2018, psu)
CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario(blog, SJTU)

Financial

RL-Stock

Others

Hacking Google reCAPTCHA v3 using Reinforcement Learning(arxiv 2019, Password cracking)
SoftCon: Simulation and Control of Soft-Bodied Animals with Biomimetic Actuators(SIGGRAPH Asia 2019, Soft-Bodied Animals Control)

Blog

Application

Deep Reinforcement Learning: Pong from Pixels(Policy Gradient)
Using Keras and Deep Q-Network to Play FlappyBird(DQN)
Build an AI to play Dino Run(DQN)
Using Deep Q-Learning in FIFA 18 to perfect the art of free-kicks(DQN)
Using Keras and Deep Deterministic Policy Gradient to play TORCS(DDPG)
Self-driving cars in the browser(DDPG)
Use proximal policy optimization to play BipedalWalker and Torcs(PPO)
复现PPO
Simple Reinforcement Learning with Tensorflow Part 8(A3C)
Reinforcement learning with the A3C algorithm(A3C)
A3C Blog Post
AlphaGo Zero demystified
World Models applied to Sonic

Tutorial

Reinforcement Learning: An Introduction (2nd Edition)
OpenAI Spinning up
李宏毅:Deep Reinforcement Learning
CMU 10703: Deep Reinforcement Learning and Control
周博磊2020RL课程

Overview

DeepMind - Deep Reinforcement Learning - RLSS 2017.pdf
A (Long) Peek into Reinforcement Learning
Gists of Recent Deep RL Algorithms
Meta-Learning: Learning to Learn Fast(Metric-based: Convolutional Siamese Neural Network/Matching Networks/Relation Network; Model-based:Memory-Augmented Neural Networks(MANN); Optimization-Based:Model-Agnostic Meta-Learning(MAML)/Reptile)
The Evolution of AlphaGo to MuZero(AlphaGo-> AlphaGo Zero -> AlphaZero -> MuZero)

Rethink

Deep reinforcement learning that matters
Deep Reinforcement Learning Doesn't Work Yet
Reinforcement Learning never worked, and 'deep' only helped a bit.
Lessons Learned Reproducing a Deep Reinforcement Learning Paper(notes)
强化学习路在何方?
Reinforcement Learning, Fast and Slow

Evolution Strategy

Evolution Strategies as a Scalable Alternative to Reinforcement Learning(blog, OpenAI 2017, ES,advantages of not calculating gradients/ easy to parallelize/more robust(such as frame-skip))
A Visual Guide to Evolution Strategies
Evolving Stable Strategies(ES on robot; task augmentation techniques)

Overview

A Brief Survey of Deep Reinforcement Learning
A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation
Multi-Agent Reinforcement Learning: A Report on Challenges and Approaches
Deep Learning for Video Game Playing
Deep Reinforcement Learning for Autonomous Driving: A Survey
Comprehensive Review of Deep Reinforcement Learning Methods and Applications in Economics
Chip Placement with Deep Reinforcement Learning

Online demo

REINFORCEjs