phd-bibliography
phd-bibliography copied to clipboard
References on Optimal Control, Reinforcement Learning and Motion Planning
Bibliography
Table of contents
-
Optimal Control
- Dynamic Programming
- Linear Programming
- Tree-Based Planning
- Control Theory
- Model Predictive Control
-
Safe Control
- Robust Control
- Risk-Averse Control
- Value-Constrained Control
- State-Constrained Control and Stability
- Uncertain Dynamical Systems
- Game Theory
-
Sequential Learning
-
Multi-Armed Bandit
- Best Arm Identification
- Black-box Optimization
-
Reinforcement Learning
- Theory
- Value-based
-
Policy-based
- Policy Gradient
- Actor-critic
- Derivative-free
- Model-based
- Exploration
- Hierarchy and Temporal Abstraction
- Partial Observability
- Transfer
- Multi-agent
- Representation Learning
- Offline
-
Multi-Armed Bandit
-
Learning from Demonstrations
-
Imitation Learning
- Applications to Autonomous Driving
-
Inverse Reinforcement Learning
- Applications to Autonomous Driving
-
Imitation Learning
-
Motion Planning
- Search
- Sampling
- Optimization
- Reactive
- Architecture and applications
Optimal Control :dart:
Dynamic Programming
- (book) Dynamic Programming, Bellman R. (1957).
- (book) Dynamic Programming and Optimal Control, Volumes 1 and 2, Bertsekas D. (1995).
- (book) Markov Decision Processes - Discrete Stochastic Dynamic Programming, Puterman M. (1995).
- An Upper Bound on the Loss from Approximate Optimal-Value Functions, Singh S., Yee R. (1994).
- Stochastic optimization of sailing trajectories in an upwind regatta, Dalang R. et al. (2015).
Linear Programming
- (book) Markov Decision Processes - Discrete Stochastic Dynamic Programming, Puterman M. (1995).
-
REPS
Relative Entropy Policy Search, Peters J. et al. (2010).
Tree-Based Planning
-
ExpectiMinimax
Optimal strategy in games with chance nodes, Melkó E., Nagy B. (2007). -
Sparse sampling
A sparse sampling algorithm for near-optimal planning in large Markov decision processes, Kearns M. et al. (2002). -
MCTS
Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Rémi Coulom, SequeL (2006). -
UCT
Bandit based Monte-Carlo Planning, Kocsis L., Szepesvári C. (2006). - Bandit Algorithms for Tree Search, Coquelin P-A., Munos R. (2007).
-
OPD
Optimistic Planning for Deterministic Systems, Hren J., Munos R. (2008). -
OLOP
Open Loop Optimistic Planning, Bubeck S., Munos R. (2010). -
SOOP
Optimistic Planning for Continuous-Action Deterministic Systems, Buşoniu L. et al. (2011). -
OPSS
Optimistic planning for sparsely stochastic systems, L. Buşoniu, R. Munos, B. De Schutter, and R. Babuska (2011). -
HOOT
Sample-Based Planning for Continuous ActionMarkov Decision Processes, Mansley C., Weinstein A., Littman M. (2011). -
HOLOP
Bandit-Based Planning and Learning inContinuous-Action Markov Decision Processes, Weinstein A., Littman M. (2012). -
BRUE
Simple Regret Optimization in Online Planning for Markov Decision Processes, Feldman Z. and Domshlak C. (2014). -
LGP
Logic-Geometric Programming: An Optimization-Based Approach to Combined Task and Motion Planning, Toussaint M. (2015). 🎞️ -
AlphaGo
Mastering the game of Go with deep neural networks and tree search, Silver D. et al. (2016). -
AlphaGo Zero
Mastering the game of Go without human knowledge, Silver D. et al. (2017). -
AlphaZero
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Silver D. et al. (2017). -
TrailBlazer
Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning, Grill J. B., Valko M., Munos R. (2017). -
MCTSnets
Learning to search with MCTSnets, Guez A. et al. (2018). -
ADI
Solving the Rubik's Cube Without Human Knowledge, McAleer S. et al. (2018). -
OPC/SOPC
Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values, Buşoniu L., Pall E., Munos R. (2018). - Real-time tree search with pessimistic scenarios: Winning the NeurIPS 2018 Pommerman Competition, Osogami T., Takahashi T. (2019)
Control Theory
- (book) The Mathematical Theory of Optimal Processes, L. S. Pontryagin, Boltyanskii V. G., Gamkrelidze R. V., and Mishchenko E. F. (1962).
- (book) Constrained Control and Estimation, Goodwin G. (2005).
-
PI²
A Generalized Path Integral Control Approach to Reinforcement Learning, Theodorou E. et al. (2010). -
PI²-CMA
Path Integral Policy Improvement with Covariance Matrix Adaptation, Stulp F., Sigaud O. (2010). -
iLQG
A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems, Todorov E. (2005). :octocat: -
iLQG+
Synthesis and stabilization of complex behaviors through online trajectory optimization, Tassa Y. (2012).
Model Predictive Control
- (book) Model Predictive Control, Camacho E. (1995).
- (book) Predictive Control With Constraints, Maciejowski J. M. (2002).
- Linear Model Predictive Control for Lane Keeping and Obstacle Avoidance on Low Curvature Roads, Turri V. et al. (2013).
-
MPCC
Optimization-based autonomous racing of 1:43 scale RC cars, Liniger A. et al. (2014). 🎞️ | 🎞️ -
MIQP
Optimal trajectory planning for autonomous driving integrating logical constraints: An MIQP perspective, Qian X., Altché F., Bender P., Stiller C. de La Fortelle A. (2016).
Safe Control :lock:
Robust Control
- Minimax analysis of stochastic problems, Shapiro A., Kleywegt A. (2002).
-
Robust DP
Robust Dynamic Programming, Iyengar G. (2005). - Robust Planning and Optimization, Laumanns M. (2011). (lecture notes)
- Robust Markov Decision Processes, Wiesemann W., Kuhn D., Rustem B. (2012).
- Safe and Robust Learning Control with Gaussian Processes, Berkenkamp F., Schoellig A. (2015). 🎞️
-
Tube-MPPI
Robust Sampling Based Model Predictive Control with Sparse Objective Information, Williams G. et al. (2018). 🎞️ - Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning, Lukas Bronke et al. (2021). :octocat:
Risk-Averse Control
- A Comprehensive Survey on Safe Reinforcement Learning, García J., Fernández F. (2015).
-
RA-QMDP
Risk-averse Behavior Planning for Autonomous Driving under Uncertainty, Naghshvar M. et al. (2018). -
StoROO
X-Armed Bandits: Optimizing Quantiles and Other Risks, Torossian L., Garivier A., Picheny V. (2019). - Worst Cases Policy Gradients, Tang Y. C. et al. (2019).
- Model-Free Risk-Sensitive Reinforcement Learning, Delétang G. et al. (2021).
- Optimal Thompson Sampling strategies for support-aware CVaR bandits, Baudry D., Gautron R., Kaufmann E., Maillard O. (2021).
Value-Constrained Control
-
ICS
Will the Driver Seat Ever Be Empty?, Fraichard T. (2014). -
SafeOPT
Safe Controller Optimization for Quadrotors with Gaussian Processes, Berkenkamp F., Schoellig A., Krause A. (2015). 🎞️ :octocat: -
SafeMDP
Safe Exploration in Finite Markov Decision Processes with Gaussian Processes, Turchetta M., Berkenkamp F., Krause A. (2016). :octocat: -
RSS
On a Formal Model of Safe and Scalable Self-driving Cars, Shalev-Shwartz S. et al. (2017). -
CPO
Constrained Policy Optimization, Achiam J., Held D., Tamar A., Abbeel P. (2017). :octocat: -
RCPO
Reward Constrained Policy Optimization, Tessler C., Mankowitz D., Mannor S. (2018). -
BFTQ
A Fitted-Q Algorithm for Budgeted MDPs, Carrara N. et al. (2018). -
SafeMPC
Learning-based Model Predictive Control for Safe Exploration, Koller T, Berkenkamp F., Turchetta M. Krause A. (2018). -
CCE
Constrained Cross-Entropy Method for Safe Reinforcement Learning, Wen M., Topcu U. (2018). :octocat: -
LTL-RL
Reinforcement Learning with Probabilistic Guarantees for Autonomous Driving, Bouton M. et al. (2019). - Safe Reinforcement Learning with Scene Decomposition for Navigating Complex Urban Environments, Bouton M. et al. (2019). :octocat:
- Batch Policy Learning under Constraints, Le H., Voloshin C., Yue Y. (2019).
- Value constrained model-free continuous control, Bohez S. et al (2019). 🎞️
- Safely Learning to Control the Constrained Linear Quadratic Regulator, Dean S. et al (2019).
- Learning to Walk in the Real World with Minimal Human Effort, Ha S. et al. (2020) 🎞️
- Responsive Safety in Reinforcement Learning by PID Lagrangian Methods, Stooke A., Achiam J., Abbeel P. (2020). :octocat:
-
Envelope MOQ-Learning
A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation, Yang R. et al (2019).
State-Constrained Control and Stability
-
HJI-reachability
Safe learning for control: Combining disturbance estimation, reachability analysis and reinforcement learning with systematic exploration, Heidenreich C. (2017). -
MPC-HJI
On Infusing Reachability-Based Safety Assurance within Probabilistic Planning Frameworks for Human-Robot Vehicle Interactions, Leung K. et al. (2018). - A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems, Fisac J. et al (2017). 🎞️
- Safe Model-based Reinforcement Learning with Stability Guarantees, Berkenkamp F. et al. (2017).
-
Lyapunov-Net
Safe Interactive Model-Based Learning, Gallieri M. et al. (2019). - Enforcing robust control guarantees within neural network policies, Donti P. et al. (2021). :octocat:
-
ATACOM
Robot Reinforcement Learning on the Constraint Manifold, Liu P. et al (2021).
Uncertain Dynamical Systems
- Simulation of Controlled Uncertain Nonlinear Systems, Tibken B., Hofer E. (1995).
- Trajectory computation of dynamic uncertain systems, Adrot O., Flaus J-M. (2002).
- Simulation of Uncertain Dynamic Systems Described By Interval Models: a Survey, Puig V. et al. (2005).
- Design of interval observers for uncertain dynamical systems, Efimov D., Raïssi T. (2016).
Game Theory :spades:
- Hierarchical Game-Theoretic Planning for Autonomous Vehicles, Fisac J. et al. (2018).
- Efficient Iterative Linear-Quadratic Approximations for Nonlinear Multi-Player General-Sum Differential Games, Fridovich-Keil D. et al. (2019). 🎞️
Sequential Learning :shoe:
- Prediction, Learning and Games, Cesa-Bianchi N., Lugosi G. (2006).
Multi-Armed Bandit :slot_machine:
-
TS
On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples, Thompson W. (1933). - Exploration and Exploitation in Organizational Learning, March J. (1991).
-
UCB1 / UCB2
Finite-time Analysis of the Multiarmed Bandit Problem, Auer P., Cesa-Bianchi N., Fischer P. (2002). -
Empirical Bernstein / UCB-V
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits, Audibert J-Y, Munos R., Szepesvari C. (2009). - Empirical Bernstein Bounds and Sample Variance Penalization, Maurer A., Ponti M. (2009).
- An Empirical Evaluation of Thompson Sampling, Chapelle O., Li L. (2011).
-
kl-UCB
The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond, Garivier A., Cappé O. (2011). -
KL-UCB
Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, Cappé O. et al. (2013). -
IDS
Information Directed Sampling and Bandits with Heteroscedastic Noise Kirschner J., Krause A. (2018).
Contextual
-
LinUCB
A Contextual-Bandit Approach to Personalized News Article Recommendation, Li L. et al. (2010). -
OFUL
Improved Algorithms for Linear Stochastic Bandits, Abbasi-yadkori Y., Pal D., Szepesvári C. (2011). - Contextual Bandits with Linear Payoff Functions, Chu W. et al. (2011).
- Self-normalization techniques for streaming confident regression, Maillard O.-A. (2017).
- Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems Mann T. et al. (2018). (prediction setting)
- Weighted Linear Bandits for Non-Stationary Environments, Russac Y. et al. (2019).
- Linear bandits with Stochastic Delayed Feedback, Vernade C. et al. (2020).
Best Arm Identification :muscle:
-
Successive Elimination
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems, Even-Dar E. et al. (2006). -
LUCB
PAC Subset Selection in Stochastic Multi-armed Bandits, Kalyanakrishnan S. et al. (2012). -
UGapE
Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence, Gabillon V., Ghavamzadeh M., Lazaric A. (2012). -
Sequential Halving
Almost Optimal Exploration in Multi-Armed Bandits, Karnin Z. et al (2013). -
M-LUCB / M-Racing
Maximin Action Identification: A New Bandit Framework for Games, Garivier A., Kaufmann E., Koolen W. (2016). -
Track-and-Stop
Optimal Best Arm Identification with Fixed Confidence, Garivier A., Kaufmann E. (2016). -
LUCB-micro
Structured Best Arm Identification with Fixed Confidence, Huang R. et al. (2017).
Black-box Optimization :black_large_square:
-
GP-UCB
Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design, Srinivas N., Krause A., Kakade S., Seeger M. (2009). -
HOO
X–Armed Bandits, Bubeck S., Munos R., Stoltz G., Szepesvari C. (2009). -
DOO/SOO
Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness, Munos R. (2011). -
StoOO
From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, Munos R. (2014). -
StoSOO
Stochastic Simultaneous Optimistic Optimization, Valko M., Carpentier A., Munos R. (2013). -
POO
Black-box optimization of noisy functions with unknown smoothness, Grill J-B., Valko M., Munos R. (2015). -
EI-GP
Bayesian Optimization in AlphaGo, Chen Y. et al. (2018)
Reinforcement Learning :robot:
- Reinforcement learning: A survey, Kaelbling L. et al. (1996).
Theory :books:
- Expected mistake bound model for on-line reinforcement learning, Fiechter C-N. (1997).
-
UCRL2
Near-optimal Regret Bounds for Reinforcement Learning, Jaksch T. (2010). -
PSRL
Why is Posterior Sampling Better than Optimism for Reinforcement Learning?, Osband I., Van Roy B. (2016). -
UCBVI
Minimax Regret Bounds for Reinforcement Learning, Azar M., Osband I., Munos R. (2017). -
Q-Learning-UCB
Is Q-Learning Provably Efficient?, Jin C., Allen-Zhu Z., Bubeck S., Jordan M. (2018). -
LSVI-UCB
Provably Efficient Reinforcement Learning with Linear Function Approximation, Jin C., Yang Z., Wang Z., Jordan M. (2019). - Lipschitz Continuity in Model-based Reinforcement Learning, Asadi K. et al (2018).
-
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces, Yang Z., Jin C., Wang Z., Wang M., Jordan M. (2021)
Generative Model
-
QVI
On the Sample Complexity of Reinforcement Learning with a Generative Model, Azar M., Munos R., Kappen B. (2012). - Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal, Agarwal A. et al. (2019).
Policy Gradient
- Policy Gradient Methods for Reinforcement Learning with Function Approximation, Sutton R. et al (2000).
- Approximately Optimal Approximate Reinforcement Learning, Kakade S., Langford J. (2002).
- On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift, Agarwal A. et al. (2019)
- PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning, Agarwal A. et al. (2020)
- Is the Policy Gradient a Gradient?, Nota C., Thomas P. S. (2020).
Linear Systems
- PAC Adaptive Control of Linear Systems, Fiechter C.-N. (1997)
-
OFU-LQ
Regret Bounds for the Adaptive Control of Linear Quadratic Systems, Abbasi-Yadkori Y., Szepesvari C. (2011). -
TS-LQ
Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems, Abeille M., Lazaric A. (2018). - Exploration-Exploitation with Thompson Sampling in Linear Systems, Abeille M. (2017). (phd thesis)
-
Coarse-Id
On the Sample Complexity of the Linear Quadratic Regulator, Dean S., Mania H., Matni N., Recht B., Tu S. (2017). - Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator, Dean S. et al (2018).
- Robust exploration in linear quadratic reinforcement learning, Umenberger J. et al (2019).
-
Online Control with Adversarial Disturbances, Agarwal N. et al (2019).
-
Logarithmic Regret for Online Control, Agarwal N. et al (2019).
Value-based :chart_with_upwards_trend:
-
NFQ
Neural fitted Q iteration - First experiences with a data efficient neural Reinforcement Learning method, Riedmiller M. (2005). -
DQN
Playing Atari with Deep Reinforcement Learning, Mnih V. et al. (2013). 🎞️ -
DDQN
Deep Reinforcement Learning with Double Q-learning, van Hasselt H., Silver D. et al. (2015). -
DDDQN
Dueling Network Architectures for Deep Reinforcement Learning, Wang Z. et al. (2015). 🎞️ -
PDDDQN
Prioritized Experience Replay, Schaul T. et al. (2015). -
NAF
Continuous Deep Q-Learning with Model-based Acceleration, Gu S. et al. (2016). -
Rainbow
Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel M. et al. (2017). -
Ape-X DQfD
Observe and Look Further: Achieving Consistent Performance on Atari, Pohlen T. et al. (2018). 🎞️
Policy-based :muscle:
Policy gradient
-
REINFORCE
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams R. (1992). -
Natural Gradient
A Natural Policy Gradient, Kakade S. (2002). - Policy Gradient Methods for Robotics, Peters J., Schaal S. (2006).
-
TRPO
Trust Region Policy Optimization, Schulman J. et al. (2015). 🎞️ -
PPO
Proximal Policy Optimization Algorithms, Schulman J. et al. (2017). 🎞️ -
DPPO
Emergence of Locomotion Behaviours in Rich Environments, Heess N. et al. (2017). 🎞️
Actor-critic
-
AC
Policy Gradient Methods for Reinforcement Learning with Function Approximation, Sutton R. et al. (1999). -
NAC
Natural Actor-Critic, Peters J. et al. (2005). -
DPG
Deterministic Policy Gradient Algorithms, Silver D. et al. (2014). -
DDPG
Continuous Control With Deep Reinforcement Learning, Lillicrap T. et al. (2015). 🎞️ 1 | 2 | 3 | 4 -
MACE
Terrain-Adaptive Locomotion Skills Using Deep Reinforcement Learning, Peng X., Berseth G., van de Panne M. (2016). 🎞️ | 🎞️ -
A3C
Asynchronous Methods for Deep Reinforcement Learning, Mnih V. et al 2016. 🎞️ 1 | 2 | 3 -
SAC
Soft Actor-Critic : Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja T. et al. (2018). 🎞️ -
MPO
Maximum a Posteriori Policy Optimisation, Abdolmaleki A. et al (2018). - A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms, Zhang S., Laroche R. et al. (2020).
Derivative-free
-
CEM
Learning Tetris Using the Noisy Cross-Entropy Method, Szita I., Lörincz A. (2006). 🎞️ -
CMAES
Completely Derandomized Self-Adaptation in Evolution Strategies, Hansen N., Ostermeier A. (2001). -
NEAT
Evolving Neural Networks through Augmenting Topologies, Stanley K. (2002). 🎞️ -
iCEM
Sample-efficient Cross-Entropy Method for Real-time Planning, Pinneri C. et al. (2020).
Model-based :world_map:
-
Dyna
Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming, Sutton R. (1990). -
PILCO
PILCO: A Model-Based and Data-Efficient Approach to Policy Search, Deisenroth M., Rasmussen C. (2011). (talk) -
DBN
Probabilistic MDP-behavior planning for cars, Brechtel S. et al. (2011). -
GPS
End-to-End Training of Deep Visuomotor Policies, Levine S. et al. (2015). 🎞️ -
DeepMPC
DeepMPC: Learning Deep Latent Features for Model Predictive Control, Lenz I. et al. (2015). 🎞️ -
SVG
Learning Continuous Control Policies by Stochastic Value Gradients, Heess N. et al. (2015). 🎞️ -
FARNN
Nonlinear Systems Identification Using Deep Dynamic Neural Networks, Ogunmolu O. et al. (2016). :octocat: - Optimal control with learned local models: Application to dexterous manipulation, Kumar V. et al. (2016). 🎞️
-
BPTT
Long-term Planning by Short-term Prediction, Shalev-Shwartz S. et al. (2016). 🎞️ 1 | 2 - Deep visual foresight for planning robot motion, Finn C., Levine S. (2016). 🎞️
-
VIN
Value Iteration Networks, Tamar A. et al (2016). 🎞️ -
VPN
Value Prediction Network, Oh J. et al. (2017). -
DistGBP
Model-Based Planning with Discrete and Continuous Actions, Henaff M. et al. (2017). 🎞️ 1 | 2 - Prediction and Control with Temporal Segment Models, Mishra N. et al. (2017).
-
Predictron
The Predictron: End-To-End Learning and Planning, Silver D. et al. (2017). 🎞️ -
MPPI
Information Theoretic MPC for Model-Based Reinforcement Learning, Williams G. et al. (2017). :octocat: 🎞️ - Learning Real-World Robot Policies by Dreaming, Piergiovanni A. et al. (2018).
- Coupled Longitudinal and Lateral Control of a Vehicle using Deep Learning, Devineau G., Polack P., Alchté F., Moutarde F. (2018) 🎞️
-
PlaNet
Learning Latent Dynamics for Planning from Pixels, Hafner et al. (2018). 🎞️ -
NeuralLander
Neural Lander: Stable Drone Landing Control using Learned Dynamics, Shi G. et al. (2018). 🎞️ -
DBN+POMCP
Towards Human-Like Prediction and Decision-Making for Automated Vehicles in Highway Scenarios , Sierra Gonzalez D. (2019). - Planning with Goal-Conditioned Policies, Nasiriany S. et al. (2019). 🎞️
-
MuZero
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model, Schrittwiese J. et al. (2019). :octocat: -
BADGR
BADGR: An Autonomous Self-Supervised Learning-Based Navigation System, Kahn G., Abbeel P., Levine S. (2020). 🎞️ :octocat: -
H-UCRL
Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning, Curi S., Berkenkamp F., Krause A. (2020). :octocat:
Exploration :tent:
- Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear, Lipton Z. et al. (2016).
-
Pseudo-count
Unifying Count-Based Exploration and Intrinsic Motivation, Bellemare M. et al (2016). 🎞️ -
HER
Hindsight Experience Replay, Andrychowicz M. et al. (2017). 🎞️ -
VHER
Visual Hindsight Experience Replay, Sahni H. et al. (2019). -
RND
Exploration by Random Network Distillation, Burda Y. et al. (OpenAI) (2018). 🎞️ -
Go-Explore
Go-Explore: a New Approach for Hard-Exploration Problems, Ecoffet A. et al. (Uber) (2018). 🎞️ -
C51-IDS
Information-Directed Exploration for Deep Reinforcement Learning, Nikolov N., Kirschner J., Berkenkamp F., Krause A. (2019). :octocat: -
Plan2Explore
Planning to Explore via Self-Supervised World Models, Sekar R. et al. (2020). 🎞️ :octocat: -
RIDE
RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments, Raileanu R., Rocktäschel T., (2020). :octocat:
Hierarchy and Temporal Abstraction :clock2:
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Sutton R. et al. (1999).
- Intrinsically motivated learning of hierarchical collections of skills, Barto A. et al. (2004).
-
OC
The Option-Critic Architecture, Bacon P-L., Harb J., Precup D. (2016). - Learning and Transfer of Modulated Locomotor Controllers, Heess N. et al. (2016). 🎞️
- Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving, Shalev-Shwartz S. et al. (2016).
-
FuNs
FeUdal Networks for Hierarchical Reinforcement Learning, Vezhnevets A. et al. (2017). - Combining Neural Networks and Tree Search for Task and Motion Planning in Challenging Environments, Paxton C. et al. (2017). 🎞️
-
DeepLoco
DeepLoco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning , Peng X. et al. (2017). 🎞️ | 🎞️ - Hierarchical Policy Design for Sample-Efficient Learning of Robot Table Tennis Through Self-Play, Mahjourian R. et al (2018). 🎞️
-
DAC
DAC: The Double Actor-Critic Architecture for Learning Options, Zhang S., Whiteson S. (2019). - Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real, Nachum O. et al (2019). 🎞️
- SoftCon: Simulation and Control of Soft-Bodied Animals with Biomimetic Actuators, Min S. et al. (2020). 🎞️ :octocat:
-
H-REIL
Reinforcement Learning based Control of Imitative Policies for Near-Accident Driving, Cao Z. et al. (2020). 🎞️ 1, 2
Partial Observability :eye:
-
PBVI
Point-based Value Iteration: An anytime algorithm for POMDPs, Pineau J. et al. (2003). -
cPBVI
Point-Based Value Iteration for Continuous POMDPs, Porta J. et al. (2006). -
POMCP
Monte-Carlo Planning in Large POMDPs, Silver D., Veness J. (2010). - A POMDP Approach to Robot Motion Planning under Uncertainty, Du Y. et al. (2010).
- Probabilistic Online POMDP Decision Making for Lane Changes in Fully Automated Driving, Ulbrich S., Maurer M. (2013).
- Solving Continuous POMDPs: Value Iteration with Incremental Learning of an Efficient Space Representation, Brechtel S. et al. (2013).
- Probabilistic Decision-Making under Uncertainty for Autonomous Driving using Continuous POMDPs, Brechtel S. et al. (2014).
-
MOMDP
Intention-Aware Motion Planning, Bandyopadhyay T. et al. (2013). -
DNC
Hybrid computing using a neural network with dynamic external memory, Graves A. et al (2016). 🎞️ - The value of inferring the internal state of traffic participants for autonomous freeway driving, Sunberg Z. et al. (2017).
- Belief State Planning for Autonomously Navigating Urban Intersections, Bouton M., Cosgun A., Kochenderfer M. (2017).
- Scalable Decision Making with Sensor Occlusions for Autonomous Driving, Bouton M. et al. (2018).
- Probabilistic Decision-Making at Road Intersections: Formulation and Quantitative Evaluation, Barbier M., Laugier C., Simonin O., Ibanez J. (2018).
- Beauty and the Beast: Optimal Methods Meet Learning for Drone Racing, Kaufmann E. et al. (2018). 🎞️
-
social perception
Behavior Planning of Autonomous Cars with Social Perception, Sun L. et al (2019).
Transfer :earth_americas:
-
IT&E
Robots that can adapt like animals, Cully A., Clune J., Tarapore D., Mouret J-B. (2014). 🎞️ -
MAML
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn C., Abbeel P., Levine S. (2017). 🎞️ - Virtual to Real Reinforcement Learning for Autonomous Driving, Pan X. et al. (2017). 🎞️
- Sim-to-Real: Learning Agile Locomotion For Quadruped Robots, Tan J. et al. (2018). 🎞️
-
ME-TRPO
Model-Ensemble Trust-Region Policy Optimization, Kurutach T. et al. (2018). 🎞️ - Kickstarting Deep Reinforcement Learning, Schmitt S. et al. (2018).
- Learning Dexterous In-Hand Manipulation, OpenAI (2018). 🎞️
-
GrBAL / ReBAL
Learning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning, Nagabandi A. et al. (2018). 🎞️ - Learning agile and dynamic motor skills for legged robots, Hwangbo J. et al. (ETH Zurich / Intel ISL) (2019). 🎞️
- Robust Recovery Controller for a Quadrupedal Robot using Deep Reinforcement Learning, Lee J., Hwangbo J., Hutter M. (ETH Zurich RSL) (2019)
-
IT&E
Learning and adapting quadruped gaits with the "Intelligent Trial & Error" algorithm, Dalin E., Desreumaux P., Mouret J-B. (2019). 🎞️ -
FAMLE
Fast Online Adaptation in Robotics through Meta-Learning Embeddings of Simulated Priors, Kaushik R., Anne T., Mouret J-B. (2020). 🎞️ - Robust Deep Reinforcement Learning against Adversarial Perturbations on Observations, Zhang H. et al (2020). :octocat:
- Learning quadrupedal locomotion over challenging terrain, Lee J. et al. (2020). 🎞️
-
PACOH
PACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees, Rothfuss J., Fortuin V., Josifoski M., Krause A. (2021). - Model-Based Domain Generalization, Robey A. et al. (2021).
-
SimGAN
SimGAN: Hybrid Simulator Identification for Domain Adaptation via Adversarial Reinforcement Learning, Jiang Y. et al. (2021). 🎞️ :octocat: - Learning robust perceptive locomotion for quadrupedal robots in the wild, Miki T. et al. (2022).
Multi-agent :two_men_holding_hands:
-
Minimax-Q
Markov games as a framework for multi-agent reinforcement learning, M. Littman (1994). - Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems, Albrecht S., Stone P. (2017).
-
MILP
Time-optimal coordination of mobile robots along specified paths, Altché F. et al. (2016). 🎞️ -
MIQP
An Algorithm for Supervised Driving of Cooperative Semi-Autonomous Vehicles, Altché F. et al. (2017). 🎞️ -
SA-CADRL
Socially Aware Motion Planning with Deep Reinforcement Learning, Chen Y. et al. (2017). 🎞️ - Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment, Galceran E. et al. (2017).
- Online decision-making for scalable autonomous systems, Wray K. et al. (2017).
-
MAgent
MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence, Zheng L. et al. (2017). 🎞️ - Cooperative Motion Planning for Non-Holonomic Agents with Value Iteration Networks, Rehder E. et al. (2017).
-
MPPO
Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning, Long P. et al. (2017). 🎞️ -
COMA
Counterfactual Multi-Agent Policy Gradients, Foerster J. et al. (2017). -
MADDPG
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments, Lowe R. et al (2017). :octocat: -
FTW
Human-level performance in first-person multiplayer games with population-based deep reinforcement learning, Jaderberg M. et al. (2018). 🎞️ - Towards Learning Multi-agent Negotiations via Self-Play, Tang Y. C. (2020).
-
MAPPO
The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games, Yu C. et al. (2021). |:octocat:](https://github.com/marlbenchmark/on-policy) - Many-agent Reinforcement Learning, Yang Y. (2021)
Representation Learning
- Variable Resolution Discretization in Optimal Control, Munos R., Moore A. (2002). 🎞️
-
DeepDriving
DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving, Chen C. et al. (2015). 🎞️ - On the Sample Complexity of End-to-end Training vs. Semantic Abstraction Training, Shalev-Shwartz S. et al. (2016).
- Learning sparse representations in reinforcement learning with sparse coding, Le L., Kumaraswamy M., White M. (2017).
- World Models, Ha D., Schmidhuber J. (2018). 🎞️ :octocat:
- Learning to Drive in a Day, Kendall A. et al. (2018). 🎞️
-
MERLIN
Unsupervised Predictive Memory in a Goal-Directed Agent, Wayne G. et al. (2018). 🎞️ 1 | 2 | 3 | 4 | 5 | 6 - Variational End-to-End Navigation and Localization, Amini A. et al. (2018). 🎞️
- Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks, Lee M. et al. (2018). 🎞️
- Deep Neuroevolution of Recurrent and Discrete World Models, Risi S., Stanley K.O. (2019). 🎞️ :octocat:
-
FERM
A Framework for Efficient Robotic Manipulation, Zhan A., Zhao R. et al. (2021). :octocat: -
S4RL
S4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning, Sinha S. et al (2021).
Offline
-
SPI-BB
Safe Policy Improvement with Baseline Bootstrapping, Laroche R. et al (2019). -
AWAC
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets, Nair A. et al (2020). -
CQL
Conservative Q-Learning for Offline Reinforcement Learning, Kumar A. et al. (2020). - Decision Transformer: Reinforcement Learning via Sequence Modeling, Chen L., Lu K. et al. (2021). :octocat:
- Reinforcement Learning as One Big Sequence Modeling Problem, Janner M., Li Q., Levine S. (2021).
Other
- Is the Bellman residual a bad proxy?, Geist M., Piot B., Pietquin O. (2016).
- Deep Reinforcement Learning that Matters, Henderson P. et al. (2017).
- Automatic Bridge Bidding Using Deep Reinforcement Learning, Yeh C. and Lin H. (2016).
- Shared Autonomy via Deep Reinforcement Learning, Reddy S. et al. (2018). 🎞️
- Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review, Levine S. (2018).
- The Value Function Polytope in Reinforcement Learning, Dadashi R. et al. (2019).
- On Value Functions and the Agent-Environment Boundary, Jiang N. (2019).
- How to Train Your Robot with Deep Reinforcement Learning; Lessons We've Learned, Ibartz J. et al (2021).
Learning from Demonstrations :mortar_board:
Imitation Learning
-
DAgger
A Reduction of Imitation Learning and Structured Predictionto No-Regret Online Learning, Ross S., Gordon G., Bagnell J. A. (2011). -
QMDP-RCNN
Reinforcement Learning via Recurrent Convolutional Neural Networks, Shankar T. et al. (2016). (talk) -
DQfD
Learning from Demonstrations for Real World Reinforcement Learning, Hester T. et al. (2017). 🎞️ - Find Your Own Way: Weakly-Supervised Segmentation of Path Proposals for Urban Autonomy, Barnes D., Maddern W., Posner I. (2016). 🎞️
-
GAIL
Generative Adversarial Imitation Learning, Ho J., Ermon S. (2016). - From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots, Pfeiffer M. et al. (2017). 🎞️
-
Branched
End-to-end Driving via Conditional Imitation Learning, Codevilla F. et al. (2017). 🎞️ | talk -
UPN
Universal Planning Networks, Srinivas A. et al. (2018). 🎞️ -
DeepMimic
DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills, Peng X. B. et al. (2018). 🎞️ -
R2P2
Deep Imitative Models for Flexible Inference, Planning, and Control, Rhinehart N. et al. (2018). 🎞️ - Learning Agile Robotic Locomotion Skills by Imitating Animals, Bin Peng X. et al (2020). 🎞️
- Deep Imitative Models for Flexible Inference, Planning, and Control, Rhinehart N., McAllister R., Levine S. (2020).
Applications to Autonomous Driving :car:
- ALVINN, an autonomous land vehicle in a neural network, Pomerleau D. (1989).
- End to End Learning for Self-Driving Cars, Bojarski M. et al. (2016). 🎞️
- End-to-end Learning of Driving Models from Large-scale Video Datasets, Xu H., Gao Y. et al. (2016). 🎞️
- End-to-End Deep Learning for Steering Autonomous Vehicles Considering Temporal Dependencies, Eraqi H. et al. (2017).
- Driving Like a Human: Imitation Learning for Path Planning using Convolutional Neural Networks, Rehder E. et al. (2017).
- Imitating Driver Behavior with Generative Adversarial Networks, Kuefler A. et al. (2017).
-
PS-GAIL
Multi-Agent Imitation Learning for Driving Simulation, Bhattacharyya R. et al. (2018). 🎞️ :octocat: - Deep Imitation Learning for Autonomous Driving in Generic Urban Scenarios with Enhanced Safety, Chen J. et al. (2019).
Inverse Reinforcement Learning
-
Projection
Apprenticeship learning via inverse reinforcement learning, Abbeel P., Ng A. (2004). -
MMP
Maximum margin planning, Ratliff N. et al. (2006). -
BIRL
Bayesian inverse reinforcement learning, Ramachandran D., Amir E. (2007). -
MEIRL
Maximum Entropy Inverse Reinforcement Learning, Ziebart B. et al. (2008). -
LEARCH
Learning to search: Functional gradient techniques for imitation learning, Ratliff N., Siver D. Bagnell A. (2009). -
CIOC
Continuous Inverse Optimal Control with Locally Optimal Examples, Levine S., Koltun V. (2012). 🎞️ -
MEDIRL
Maximum Entropy Deep Inverse Reinforcement Learning, Wulfmeier M. (2015). -
GCL
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, Finn C. et al. (2016). 🎞️ -
RIRL
Repeated Inverse Reinforcement Learning, Amin K. et al. (2017). - Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning, Piot B. et al. (2017).
Applications to Autonomous Driving :taxi:
- Apprenticeship Learning for Motion Planning, with Application to Parking Lot Navigation, Abbeel P. et al. (2008).
- Navigate like a cabbie: Probabilistic reasoning from observed context-aware behavior, Ziebart B. et al. (2008).
- Planning-based Prediction for Pedestrians, Ziebart B. et al. (2009). 🎞️
- Learning for autonomous navigation, Bagnell A. et al. (2010).
- Learning Autonomous Driving Styles and Maneuvers from Expert Demonstration, Silver D. et al. (2012).
- Learning Driving Styles for Autonomous Vehicles from Demonstration, Kuderer M. et al. (2015).
- Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks, Sharifzadeh S. et al. (2016).
- Watch This: Scalable Cost-Function Learning for Path Planning in Urban Environments, Wulfmeier M. (2016). 🎞️
- Planning for Autonomous Cars that Leverage Effects on Human Actions, Sadigh D. et al. (2016).
- A Learning-Based Framework for Handling Dilemmas in Urban Automated Driving, Lee S., Seo S. (2017).
- Learning Trajectory Prediction with Continuous Inverse Optimal Control via Langevin Sampling of Energy-Based Models, Xu Y. et al. (2019).
- Analyzing the Suitability of Cost Functions for Explaining and Imitating Human Driving Behavior based on Inverse Reinforcement Learning, Naumann M. et al (2020).
Motion Planning :running_man:
Search
-
Dijkstra
A Note on Two Problems in Connexion with Graphs, Dijkstra E. W. (1959). -
A*
A Formal Basis for the Heuristic Determination of Minimum Cost Paths , Hart P. et al. (1968). - Planning Long Dynamically-Feasible Maneuvers For Autonomous Vehicles, Likhachev M., Ferguson D. (2008).
- Optimal Trajectory Generation for Dynamic Street Scenarios in a Frenet Frame, Werling M., Kammel S. (2010). 🎞️
- 3D perception and planning for self-driving and cooperative automobiles, Stiller C., Ziegler J. (2012).
- Motion Planning under Uncertainty for On-Road Autonomous Driving, Xu W. et al. (2014).
- Monte Carlo Tree Search for Simulated Car Racing, Fischer J. et al. (2015). 🎞️
Sampling
-
RRT*
Sampling-based Algorithms for Optimal Motion Planning, Karaman S., Frazzoli E. (2011). 🎞️ -
LQG-MP
LQG-MP: Optimized Path Planning for Robots with Motion Uncertainty and Imperfect State Information, van den Berg J. et al. (2010). - Motion Planning under Uncertainty using Differential Dynamic Programming in Belief Space, van den Berg J. et al. (2011).
- Rapidly-exploring Random Belief Trees for Motion Planning Under Uncertainty, Bry A., Roy N. (2011).
-
PRM-RL
PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning, Faust A. et al. (2017).
Optimization
- Trajectory planning for Bertha - A local, continuous method, Ziegler J. et al. (2014).
- Learning Attractor Landscapes for Learning Motor Primitives, Ijspeert A. et al. (2002).
- Online Motion Planning based on Nonlinear Model Predictive Control with Non-Euclidean Rotation Groups, Rösmann C. et al (2020). :octocat:
Reactive
-
PF
Real-time obstacle avoidance for manipulators and mobile robots, Khatib O. (1986). -
VFH
The Vector Field Histogram - Fast Obstacle Avoidance For Mobile Robots, Borenstein J. (1991). -
VFH+
VFH+: Reliable Obstacle Avoidance for Fast Mobile Robots, Ulrich I., Borenstein J. (1998). -
Velocity Obstacles
Motion planning in dynamic environments using velocity obstacles, Fiorini P., Shillert Z. (1998).
Architecture and applications
- A Review of Motion Planning Techniques for Automated Vehicles, González D. et al. (2016).
- A Survey of Motion Planning and Control Techniques for Self-driving Urban Vehicles, Paden B. et al. (2016).
- Autonomous driving in urban environments: Boss and the Urban Challenge, Urmson C. et al. (2008).
- The MIT-Cornell collision and why it happened, Fletcher L. et al. (2008).
- Making bertha drive-an autonomous journey on a historic route, Ziegler J. et al. (2014).