phd-bibliography
phd-bibliography copied to clipboard
References on Optimal Control, Reinforcement Learning and Motion Planning
Bibliography
Table of contents
- Optimal Control
- Dynamic Programming
- Linear Programming
- Tree-Based Planning
- Control Theory
- Model Predictive Control
- Safe Control
- Robust Control
- Risk-Averse Control
- Value-Constrained Control
- State-Constrained Control and Stability
- Uncertain Dynamical Systems
- Game Theory
- Sequential Learning
- Multi-Armed Bandit
- Best Arm Identification
- Black-box Optimization
- Reinforcement Learning
- Theory
- Value-based
- Policy-based
- Policy Gradient
- Actor-critic
- Derivative-free
- Model-based
- Exploration
- Hierarchy and Temporal Abstraction
- Partial Observability
- Transfer
- Multi-agent
- Representation Learning
- Offline
- Multi-Armed Bandit
- Learning from Demonstrations
- Imitation Learning
- Applications to Autonomous Driving
- Inverse Reinforcement Learning
- Applications to Autonomous Driving
- Imitation Learning
- Motion Planning
- Search
- Sampling
- Optimization
- Reactive
- Architecture and applications
Optimal Control :dart:
Dynamic Programming
- (book) Dynamic Programming, Bellman R. (1957).
- (book) Dynamic Programming and Optimal Control, Volumes 1 and 2, Bertsekas D. (1995).
- (book) Markov Decision Processes - Discrete Stochastic Dynamic Programming, Puterman M. (1995).
- An Upper Bound on the Loss from Approximate Optimal-Value Functions, Singh S., Yee R. (1994).
- Stochastic optimization of sailing trajectories in an upwind regatta, Dalang R. et al. (2015).
Linear Programming
- (book) Markov Decision Processes - Discrete Stochastic Dynamic Programming, Puterman M. (1995).
REPSRelative Entropy Policy Search, Peters J. et al. (2010).
Tree-Based Planning
ExpectiMinimaxOptimal strategy in games with chance nodes, Melkó E., Nagy B. (2007).Sparse samplingA sparse sampling algorithm for near-optimal planning in large Markov decision processes, Kearns M. et al. (2002).MCTSEfficient Selectivity and Backup Operators in Monte-Carlo Tree Search, Rémi Coulom, SequeL (2006).UCTBandit based Monte-Carlo Planning, Kocsis L., Szepesvári C. (2006).- Bandit Algorithms for Tree Search, Coquelin P-A., Munos R. (2007).
OPDOptimistic Planning for Deterministic Systems, Hren J., Munos R. (2008).OLOPOpen Loop Optimistic Planning, Bubeck S., Munos R. (2010).SOOPOptimistic Planning for Continuous-Action Deterministic Systems, Buşoniu L. et al. (2011).OPSSOptimistic planning for sparsely stochastic systems, L. Buşoniu, R. Munos, B. De Schutter, and R. Babuska (2011).HOOTSample-Based Planning for Continuous ActionMarkov Decision Processes, Mansley C., Weinstein A., Littman M. (2011).HOLOPBandit-Based Planning and Learning inContinuous-Action Markov Decision Processes, Weinstein A., Littman M. (2012).BRUESimple Regret Optimization in Online Planning for Markov Decision Processes, Feldman Z. and Domshlak C. (2014).LGPLogic-Geometric Programming: An Optimization-Based Approach to Combined Task and Motion Planning, Toussaint M. (2015). 🎞️AlphaGoMastering the game of Go with deep neural networks and tree search, Silver D. et al. (2016).AlphaGo ZeroMastering the game of Go without human knowledge, Silver D. et al. (2017).AlphaZeroMastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, Silver D. et al. (2017).TrailBlazerBlazing the trails before beating the path: Sample-efficient Monte-Carlo planning, Grill J. B., Valko M., Munos R. (2017).MCTSnetsLearning to search with MCTSnets, Guez A. et al. (2018).ADISolving the Rubik's Cube Without Human Knowledge, McAleer S. et al. (2018).OPC/SOPCContinuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values, Buşoniu L., Pall E., Munos R. (2018).- Real-time tree search with pessimistic scenarios: Winning the NeurIPS 2018 Pommerman Competition, Osogami T., Takahashi T. (2019)
Control Theory
- (book) The Mathematical Theory of Optimal Processes, L. S. Pontryagin, Boltyanskii V. G., Gamkrelidze R. V., and Mishchenko E. F. (1962).
- (book) Constrained Control and Estimation, Goodwin G. (2005).
PI²A Generalized Path Integral Control Approach to Reinforcement Learning, Theodorou E. et al. (2010).PI²-CMAPath Integral Policy Improvement with Covariance Matrix Adaptation, Stulp F., Sigaud O. (2010).iLQGA generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems, Todorov E. (2005). :octocat:iLQG+Synthesis and stabilization of complex behaviors through online trajectory optimization, Tassa Y. (2012).
Model Predictive Control
- (book) Model Predictive Control, Camacho E. (1995).
- (book) Predictive Control With Constraints, Maciejowski J. M. (2002).
- Linear Model Predictive Control for Lane Keeping and Obstacle Avoidance on Low Curvature Roads, Turri V. et al. (2013).
MPCCOptimization-based autonomous racing of 1:43 scale RC cars, Liniger A. et al. (2014). 🎞️ | 🎞️MIQPOptimal trajectory planning for autonomous driving integrating logical constraints: An MIQP perspective, Qian X., Altché F., Bender P., Stiller C. de La Fortelle A. (2016).
Safe Control :lock:
Robust Control
- Minimax analysis of stochastic problems, Shapiro A., Kleywegt A. (2002).
Robust DPRobust Dynamic Programming, Iyengar G. (2005).- Robust Planning and Optimization, Laumanns M. (2011). (lecture notes)
- Robust Markov Decision Processes, Wiesemann W., Kuhn D., Rustem B. (2012).
- Safe and Robust Learning Control with Gaussian Processes, Berkenkamp F., Schoellig A. (2015). 🎞️
Tube-MPPIRobust Sampling Based Model Predictive Control with Sparse Objective Information, Williams G. et al. (2018). 🎞️- Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning, Lukas Bronke et al. (2021). :octocat:
Risk-Averse Control
- A Comprehensive Survey on Safe Reinforcement Learning, García J., Fernández F. (2015).
RA-QMDPRisk-averse Behavior Planning for Autonomous Driving under Uncertainty, Naghshvar M. et al. (2018).StoROOX-Armed Bandits: Optimizing Quantiles and Other Risks, Torossian L., Garivier A., Picheny V. (2019).- Worst Cases Policy Gradients, Tang Y. C. et al. (2019).
- Model-Free Risk-Sensitive Reinforcement Learning, Delétang G. et al. (2021).
- Optimal Thompson Sampling strategies for support-aware CVaR bandits, Baudry D., Gautron R., Kaufmann E., Maillard O. (2021).
Value-Constrained Control
ICSWill the Driver Seat Ever Be Empty?, Fraichard T. (2014).SafeOPTSafe Controller Optimization for Quadrotors with Gaussian Processes, Berkenkamp F., Schoellig A., Krause A. (2015). 🎞️ :octocat:SafeMDPSafe Exploration in Finite Markov Decision Processes with Gaussian Processes, Turchetta M., Berkenkamp F., Krause A. (2016). :octocat:RSSOn a Formal Model of Safe and Scalable Self-driving Cars, Shalev-Shwartz S. et al. (2017).CPOConstrained Policy Optimization, Achiam J., Held D., Tamar A., Abbeel P. (2017). :octocat:RCPOReward Constrained Policy Optimization, Tessler C., Mankowitz D., Mannor S. (2018).BFTQA Fitted-Q Algorithm for Budgeted MDPs, Carrara N. et al. (2018).SafeMPCLearning-based Model Predictive Control for Safe Exploration, Koller T, Berkenkamp F., Turchetta M. Krause A. (2018).CCEConstrained Cross-Entropy Method for Safe Reinforcement Learning, Wen M., Topcu U. (2018). :octocat:LTL-RLReinforcement Learning with Probabilistic Guarantees for Autonomous Driving, Bouton M. et al. (2019).- Safe Reinforcement Learning with Scene Decomposition for Navigating Complex Urban Environments, Bouton M. et al. (2019). :octocat:
- Batch Policy Learning under Constraints, Le H., Voloshin C., Yue Y. (2019).
- Value constrained model-free continuous control, Bohez S. et al (2019). 🎞️
- Safely Learning to Control the Constrained Linear Quadratic Regulator, Dean S. et al (2019).
- Learning to Walk in the Real World with Minimal Human Effort, Ha S. et al. (2020) 🎞️
- Responsive Safety in Reinforcement Learning by PID Lagrangian Methods, Stooke A., Achiam J., Abbeel P. (2020). :octocat:
Envelope MOQ-LearningA Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation, Yang R. et al (2019).
State-Constrained Control and Stability
HJI-reachabilitySafe learning for control: Combining disturbance estimation, reachability analysis and reinforcement learning with systematic exploration, Heidenreich C. (2017).MPC-HJIOn Infusing Reachability-Based Safety Assurance within Probabilistic Planning Frameworks for Human-Robot Vehicle Interactions, Leung K. et al. (2018).- A General Safety Framework for Learning-Based Control in Uncertain Robotic Systems, Fisac J. et al (2017). 🎞️
- Safe Model-based Reinforcement Learning with Stability Guarantees, Berkenkamp F. et al. (2017).
Lyapunov-NetSafe Interactive Model-Based Learning, Gallieri M. et al. (2019).- Enforcing robust control guarantees within neural network policies, Donti P. et al. (2021). :octocat:
ATACOMRobot Reinforcement Learning on the Constraint Manifold, Liu P. et al (2021).
Uncertain Dynamical Systems
- Simulation of Controlled Uncertain Nonlinear Systems, Tibken B., Hofer E. (1995).
- Trajectory computation of dynamic uncertain systems, Adrot O., Flaus J-M. (2002).
- Simulation of Uncertain Dynamic Systems Described By Interval Models: a Survey, Puig V. et al. (2005).
- Design of interval observers for uncertain dynamical systems, Efimov D., Raïssi T. (2016).
Game Theory :spades:
- Hierarchical Game-Theoretic Planning for Autonomous Vehicles, Fisac J. et al. (2018).
- Efficient Iterative Linear-Quadratic Approximations for Nonlinear Multi-Player General-Sum Differential Games, Fridovich-Keil D. et al. (2019). 🎞️
Sequential Learning :shoe:
- Prediction, Learning and Games, Cesa-Bianchi N., Lugosi G. (2006).
Multi-Armed Bandit :slot_machine:
TSOn the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples, Thompson W. (1933).- Exploration and Exploitation in Organizational Learning, March J. (1991).
UCB1 / UCB2Finite-time Analysis of the Multiarmed Bandit Problem, Auer P., Cesa-Bianchi N., Fischer P. (2002).Empirical Bernstein / UCB-VExploration-exploitation tradeoff using variance estimates in multi-armed bandits, Audibert J-Y, Munos R., Szepesvari C. (2009).- Empirical Bernstein Bounds and Sample Variance Penalization, Maurer A., Ponti M. (2009).
- An Empirical Evaluation of Thompson Sampling, Chapelle O., Li L. (2011).
kl-UCBThe KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond, Garivier A., Cappé O. (2011).KL-UCBKullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation, Cappé O. et al. (2013).IDSInformation Directed Sampling and Bandits with Heteroscedastic Noise Kirschner J., Krause A. (2018).
Contextual
LinUCBA Contextual-Bandit Approach to Personalized News Article Recommendation, Li L. et al. (2010).OFULImproved Algorithms for Linear Stochastic Bandits, Abbasi-yadkori Y., Pal D., Szepesvári C. (2011).- Contextual Bandits with Linear Payoff Functions, Chu W. et al. (2011).
- Self-normalization techniques for streaming confident regression, Maillard O.-A. (2017).
- Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems Mann T. et al. (2018). (prediction setting)
- Weighted Linear Bandits for Non-Stationary Environments, Russac Y. et al. (2019).
- Linear bandits with Stochastic Delayed Feedback, Vernade C. et al. (2020).
Best Arm Identification :muscle:
Successive EliminationAction Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems, Even-Dar E. et al. (2006).LUCBPAC Subset Selection in Stochastic Multi-armed Bandits, Kalyanakrishnan S. et al. (2012).UGapEBest Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence, Gabillon V., Ghavamzadeh M., Lazaric A. (2012).Sequential HalvingAlmost Optimal Exploration in Multi-Armed Bandits, Karnin Z. et al (2013).M-LUCB / M-RacingMaximin Action Identification: A New Bandit Framework for Games, Garivier A., Kaufmann E., Koolen W. (2016).Track-and-StopOptimal Best Arm Identification with Fixed Confidence, Garivier A., Kaufmann E. (2016).LUCB-microStructured Best Arm Identification with Fixed Confidence, Huang R. et al. (2017).
Black-box Optimization :black_large_square:
GP-UCBGaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design, Srinivas N., Krause A., Kakade S., Seeger M. (2009).HOOX–Armed Bandits, Bubeck S., Munos R., Stoltz G., Szepesvari C. (2009).DOO/SOOOptimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness, Munos R. (2011).StoOOFrom Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning, Munos R. (2014).StoSOOStochastic Simultaneous Optimistic Optimization, Valko M., Carpentier A., Munos R. (2013).POOBlack-box optimization of noisy functions with unknown smoothness, Grill J-B., Valko M., Munos R. (2015).EI-GPBayesian Optimization in AlphaGo, Chen Y. et al. (2018)
Reinforcement Learning :robot:
- Reinforcement learning: A survey, Kaelbling L. et al. (1996).
Theory :books:
- Expected mistake bound model for on-line reinforcement learning, Fiechter C-N. (1997).
UCRL2Near-optimal Regret Bounds for Reinforcement Learning, Jaksch T. (2010).PSRLWhy is Posterior Sampling Better than Optimism for Reinforcement Learning?, Osband I., Van Roy B. (2016).UCBVIMinimax Regret Bounds for Reinforcement Learning, Azar M., Osband I., Munos R. (2017).Q-Learning-UCBIs Q-Learning Provably Efficient?, Jin C., Allen-Zhu Z., Bubeck S., Jordan M. (2018).LSVI-UCBProvably Efficient Reinforcement Learning with Linear Function Approximation, Jin C., Yang Z., Wang Z., Jordan M. (2019).- Lipschitz Continuity in Model-based Reinforcement Learning, Asadi K. et al (2018).
- On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces, Yang Z., Jin C., Wang Z., Wang M., Jordan M. (2021)
Generative Model
QVIOn the Sample Complexity of Reinforcement Learning with a Generative Model, Azar M., Munos R., Kappen B. (2012).- Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal, Agarwal A. et al. (2019).
Policy Gradient
- Policy Gradient Methods for Reinforcement Learning with Function Approximation, Sutton R. et al (2000).
- Approximately Optimal Approximate Reinforcement Learning, Kakade S., Langford J. (2002).
- On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift, Agarwal A. et al. (2019)
- PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning, Agarwal A. et al. (2020)
- Is the Policy Gradient a Gradient?, Nota C., Thomas P. S. (2020).
Linear Systems
- PAC Adaptive Control of Linear Systems, Fiechter C.-N. (1997)
OFU-LQRegret Bounds for the Adaptive Control of Linear Quadratic Systems, Abbasi-Yadkori Y., Szepesvari C. (2011).TS-LQImproved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems, Abeille M., Lazaric A. (2018).- Exploration-Exploitation with Thompson Sampling in Linear Systems, Abeille M. (2017). (phd thesis)
Coarse-IdOn the Sample Complexity of the Linear Quadratic Regulator, Dean S., Mania H., Matni N., Recht B., Tu S. (2017).- Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator, Dean S. et al (2018).
- Robust exploration in linear quadratic reinforcement learning, Umenberger J. et al (2019).
- Online Control with Adversarial Disturbances, Agarwal N. et al (2019).
- Logarithmic Regret for Online Control, Agarwal N. et al (2019).
Value-based :chart_with_upwards_trend:
NFQNeural fitted Q iteration - First experiences with a data efficient neural Reinforcement Learning method, Riedmiller M. (2005).DQNPlaying Atari with Deep Reinforcement Learning, Mnih V. et al. (2013). 🎞️DDQNDeep Reinforcement Learning with Double Q-learning, van Hasselt H., Silver D. et al. (2015).DDDQNDueling Network Architectures for Deep Reinforcement Learning, Wang Z. et al. (2015). 🎞️PDDDQNPrioritized Experience Replay, Schaul T. et al. (2015).NAFContinuous Deep Q-Learning with Model-based Acceleration, Gu S. et al. (2016).RainbowRainbow: Combining Improvements in Deep Reinforcement Learning, Hessel M. et al. (2017).Ape-X DQfDObserve and Look Further: Achieving Consistent Performance on Atari, Pohlen T. et al. (2018). 🎞️
Policy-based :muscle:
Policy gradient
REINFORCESimple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams R. (1992).Natural GradientA Natural Policy Gradient, Kakade S. (2002).- Policy Gradient Methods for Robotics, Peters J., Schaal S. (2006).
TRPOTrust Region Policy Optimization, Schulman J. et al. (2015). 🎞️PPOProximal Policy Optimization Algorithms, Schulman J. et al. (2017). 🎞️DPPOEmergence of Locomotion Behaviours in Rich Environments, Heess N. et al. (2017). 🎞️
Actor-critic
ACPolicy Gradient Methods for Reinforcement Learning with Function Approximation, Sutton R. et al. (1999).NACNatural Actor-Critic, Peters J. et al. (2005).DPGDeterministic Policy Gradient Algorithms, Silver D. et al. (2014).DDPGContinuous Control With Deep Reinforcement Learning, Lillicrap T. et al. (2015). 🎞️ 1 | 2 | 3 | 4MACETerrain-Adaptive Locomotion Skills Using Deep Reinforcement Learning, Peng X., Berseth G., van de Panne M. (2016). 🎞️ | 🎞️A3CAsynchronous Methods for Deep Reinforcement Learning, Mnih V. et al 2016. 🎞️ 1 | 2 | 3SACSoft Actor-Critic : Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja T. et al. (2018). 🎞️MPOMaximum a Posteriori Policy Optimisation, Abdolmaleki A. et al (2018).- A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms, Zhang S., Laroche R. et al. (2020).
Derivative-free
CEMLearning Tetris Using the Noisy Cross-Entropy Method, Szita I., Lörincz A. (2006). 🎞️CMAESCompletely Derandomized Self-Adaptation in Evolution Strategies, Hansen N., Ostermeier A. (2001).NEATEvolving Neural Networks through Augmenting Topologies, Stanley K. (2002). 🎞️iCEMSample-efficient Cross-Entropy Method for Real-time Planning, Pinneri C. et al. (2020).
Model-based :world_map:
DynaIntegrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming, Sutton R. (1990).PILCOPILCO: A Model-Based and Data-Efficient Approach to Policy Search, Deisenroth M., Rasmussen C. (2011). (talk)DBNProbabilistic MDP-behavior planning for cars, Brechtel S. et al. (2011).GPSEnd-to-End Training of Deep Visuomotor Policies, Levine S. et al. (2015). 🎞️DeepMPCDeepMPC: Learning Deep Latent Features for Model Predictive Control, Lenz I. et al. (2015). 🎞️SVGLearning Continuous Control Policies by Stochastic Value Gradients, Heess N. et al. (2015). 🎞️FARNNNonlinear Systems Identification Using Deep Dynamic Neural Networks, Ogunmolu O. et al. (2016). :octocat:- Optimal control with learned local models: Application to dexterous manipulation, Kumar V. et al. (2016). 🎞️
BPTTLong-term Planning by Short-term Prediction, Shalev-Shwartz S. et al. (2016). 🎞️ 1 | 2- Deep visual foresight for planning robot motion, Finn C., Levine S. (2016). 🎞️
VINValue Iteration Networks, Tamar A. et al (2016). 🎞️VPNValue Prediction Network, Oh J. et al. (2017).DistGBPModel-Based Planning with Discrete and Continuous Actions, Henaff M. et al. (2017). 🎞️ 1 | 2- Prediction and Control with Temporal Segment Models, Mishra N. et al. (2017).
PredictronThe Predictron: End-To-End Learning and Planning, Silver D. et al. (2017). 🎞️MPPIInformation Theoretic MPC for Model-Based Reinforcement Learning, Williams G. et al. (2017). :octocat: 🎞️- Learning Real-World Robot Policies by Dreaming, Piergiovanni A. et al. (2018).
- Coupled Longitudinal and Lateral Control of a Vehicle using Deep Learning, Devineau G., Polack P., Alchté F., Moutarde F. (2018) 🎞️
PlaNetLearning Latent Dynamics for Planning from Pixels, Hafner et al. (2018). 🎞️NeuralLanderNeural Lander: Stable Drone Landing Control using Learned Dynamics, Shi G. et al. (2018). 🎞️DBN+POMCPTowards Human-Like Prediction and Decision-Making for Automated Vehicles in Highway Scenarios , Sierra Gonzalez D. (2019).- Planning with Goal-Conditioned Policies, Nasiriany S. et al. (2019). 🎞️
MuZeroMastering Atari, Go, Chess and Shogi by Planning with a Learned Model, Schrittwiese J. et al. (2019). :octocat:BADGRBADGR: An Autonomous Self-Supervised Learning-Based Navigation System, Kahn G., Abbeel P., Levine S. (2020). 🎞️ :octocat:H-UCRLEfficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning, Curi S., Berkenkamp F., Krause A. (2020). :octocat:
Exploration :tent:
- Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear, Lipton Z. et al. (2016).
Pseudo-countUnifying Count-Based Exploration and Intrinsic Motivation, Bellemare M. et al (2016). 🎞️HERHindsight Experience Replay, Andrychowicz M. et al. (2017). 🎞️VHERVisual Hindsight Experience Replay, Sahni H. et al. (2019).RNDExploration by Random Network Distillation, Burda Y. et al. (OpenAI) (2018). 🎞️Go-ExploreGo-Explore: a New Approach for Hard-Exploration Problems, Ecoffet A. et al. (Uber) (2018). 🎞️C51-IDSInformation-Directed Exploration for Deep Reinforcement Learning, Nikolov N., Kirschner J., Berkenkamp F., Krause A. (2019). :octocat:Plan2ExplorePlanning to Explore via Self-Supervised World Models, Sekar R. et al. (2020). 🎞️ :octocat:RIDERIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments, Raileanu R., Rocktäschel T., (2020). :octocat:
Hierarchy and Temporal Abstraction :clock2:
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Sutton R. et al. (1999).
- Intrinsically motivated learning of hierarchical collections of skills, Barto A. et al. (2004).
OCThe Option-Critic Architecture, Bacon P-L., Harb J., Precup D. (2016).- Learning and Transfer of Modulated Locomotor Controllers, Heess N. et al. (2016). 🎞️
- Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving, Shalev-Shwartz S. et al. (2016).
FuNsFeUdal Networks for Hierarchical Reinforcement Learning, Vezhnevets A. et al. (2017).- Combining Neural Networks and Tree Search for Task and Motion Planning in Challenging Environments, Paxton C. et al. (2017). 🎞️
DeepLocoDeepLoco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning , Peng X. et al. (2017). 🎞️ | 🎞️- Hierarchical Policy Design for Sample-Efficient Learning of Robot Table Tennis Through Self-Play, Mahjourian R. et al (2018). 🎞️
DACDAC: The Double Actor-Critic Architecture for Learning Options, Zhang S., Whiteson S. (2019).- Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real, Nachum O. et al (2019). 🎞️
- SoftCon: Simulation and Control of Soft-Bodied Animals with Biomimetic Actuators, Min S. et al. (2020). 🎞️ :octocat:
H-REILReinforcement Learning based Control of Imitative Policies for Near-Accident Driving, Cao Z. et al. (2020). 🎞️ 1, 2
Partial Observability :eye:
PBVIPoint-based Value Iteration: An anytime algorithm for POMDPs, Pineau J. et al. (2003).cPBVIPoint-Based Value Iteration for Continuous POMDPs, Porta J. et al. (2006).POMCPMonte-Carlo Planning in Large POMDPs, Silver D., Veness J. (2010).- A POMDP Approach to Robot Motion Planning under Uncertainty, Du Y. et al. (2010).
- Probabilistic Online POMDP Decision Making for Lane Changes in Fully Automated Driving, Ulbrich S., Maurer M. (2013).
- Solving Continuous POMDPs: Value Iteration with Incremental Learning of an Efficient Space Representation, Brechtel S. et al. (2013).
- Probabilistic Decision-Making under Uncertainty for Autonomous Driving using Continuous POMDPs, Brechtel S. et al. (2014).
MOMDPIntention-Aware Motion Planning, Bandyopadhyay T. et al. (2013).DNCHybrid computing using a neural network with dynamic external memory, Graves A. et al (2016). 🎞️- The value of inferring the internal state of traffic participants for autonomous freeway driving, Sunberg Z. et al. (2017).
- Belief State Planning for Autonomously Navigating Urban Intersections, Bouton M., Cosgun A., Kochenderfer M. (2017).
- Scalable Decision Making with Sensor Occlusions for Autonomous Driving, Bouton M. et al. (2018).
- Probabilistic Decision-Making at Road Intersections: Formulation and Quantitative Evaluation, Barbier M., Laugier C., Simonin O., Ibanez J. (2018).
- Beauty and the Beast: Optimal Methods Meet Learning for Drone Racing, Kaufmann E. et al. (2018). 🎞️
social perceptionBehavior Planning of Autonomous Cars with Social Perception, Sun L. et al (2019).
Transfer :earth_americas:
IT&ERobots that can adapt like animals, Cully A., Clune J., Tarapore D., Mouret J-B. (2014). 🎞️MAMLModel-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn C., Abbeel P., Levine S. (2017). 🎞️- Virtual to Real Reinforcement Learning for Autonomous Driving, Pan X. et al. (2017). 🎞️
- Sim-to-Real: Learning Agile Locomotion For Quadruped Robots, Tan J. et al. (2018). 🎞️
ME-TRPOModel-Ensemble Trust-Region Policy Optimization, Kurutach T. et al. (2018). 🎞️- Kickstarting Deep Reinforcement Learning, Schmitt S. et al. (2018).
- Learning Dexterous In-Hand Manipulation, OpenAI (2018). 🎞️
GrBAL / ReBALLearning to Adapt in Dynamic, Real-World Environments Through Meta-Reinforcement Learning, Nagabandi A. et al. (2018). 🎞️- Learning agile and dynamic motor skills for legged robots, Hwangbo J. et al. (ETH Zurich / Intel ISL) (2019). 🎞️
- Robust Recovery Controller for a Quadrupedal Robot using Deep Reinforcement Learning, Lee J., Hwangbo J., Hutter M. (ETH Zurich RSL) (2019)
IT&ELearning and adapting quadruped gaits with the "Intelligent Trial & Error" algorithm, Dalin E., Desreumaux P., Mouret J-B. (2019). 🎞️FAMLEFast Online Adaptation in Robotics through Meta-Learning Embeddings of Simulated Priors, Kaushik R., Anne T., Mouret J-B. (2020). 🎞️- Robust Deep Reinforcement Learning against Adversarial Perturbations on Observations, Zhang H. et al (2020). :octocat:
- Learning quadrupedal locomotion over challenging terrain, Lee J. et al. (2020). 🎞️
PACOHPACOH: Bayes-Optimal Meta-Learning with PAC-Guarantees, Rothfuss J., Fortuin V., Josifoski M., Krause A. (2021).- Model-Based Domain Generalization, Robey A. et al. (2021).
SimGANSimGAN: Hybrid Simulator Identification for Domain Adaptation via Adversarial Reinforcement Learning, Jiang Y. et al. (2021). 🎞️ :octocat:- Learning robust perceptive locomotion for quadrupedal robots in the wild, Miki T. et al. (2022).
Multi-agent :two_men_holding_hands:
Minimax-QMarkov games as a framework for multi-agent reinforcement learning, M. Littman (1994).- Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems, Albrecht S., Stone P. (2017).
MILPTime-optimal coordination of mobile robots along specified paths, Altché F. et al. (2016). 🎞️MIQPAn Algorithm for Supervised Driving of Cooperative Semi-Autonomous Vehicles, Altché F. et al. (2017). 🎞️SA-CADRLSocially Aware Motion Planning with Deep Reinforcement Learning, Chen Y. et al. (2017). 🎞️- Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment, Galceran E. et al. (2017).
- Online decision-making for scalable autonomous systems, Wray K. et al. (2017).
MAgentMAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence, Zheng L. et al. (2017). 🎞️- Cooperative Motion Planning for Non-Holonomic Agents with Value Iteration Networks, Rehder E. et al. (2017).
MPPOTowards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning, Long P. et al. (2017). 🎞️COMACounterfactual Multi-Agent Policy Gradients, Foerster J. et al. (2017).MADDPGMulti-Agent Actor-Critic for Mixed Cooperative-Competitive Environments, Lowe R. et al (2017). :octocat:FTWHuman-level performance in first-person multiplayer games with population-based deep reinforcement learning, Jaderberg M. et al. (2018). 🎞️- Towards Learning Multi-agent Negotiations via Self-Play, Tang Y. C. (2020).
MAPPOThe Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games, Yu C. et al. (2021). |:octocat:](https://github.com/marlbenchmark/on-policy)- Many-agent Reinforcement Learning, Yang Y. (2021)
Representation Learning
- Variable Resolution Discretization in Optimal Control, Munos R., Moore A. (2002). 🎞️
DeepDrivingDeepDriving: Learning Affordance for Direct Perception in Autonomous Driving, Chen C. et al. (2015). 🎞️- On the Sample Complexity of End-to-end Training vs. Semantic Abstraction Training, Shalev-Shwartz S. et al. (2016).
- Learning sparse representations in reinforcement learning with sparse coding, Le L., Kumaraswamy M., White M. (2017).
- World Models, Ha D., Schmidhuber J. (2018). 🎞️ :octocat:
- Learning to Drive in a Day, Kendall A. et al. (2018). 🎞️
MERLINUnsupervised Predictive Memory in a Goal-Directed Agent, Wayne G. et al. (2018). 🎞️ 1 | 2 | 3 | 4 | 5 | 6- Variational End-to-End Navigation and Localization, Amini A. et al. (2018). 🎞️
- Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks, Lee M. et al. (2018). 🎞️
- Deep Neuroevolution of Recurrent and Discrete World Models, Risi S., Stanley K.O. (2019). 🎞️ :octocat:
FERMA Framework for Efficient Robotic Manipulation, Zhan A., Zhao R. et al. (2021). :octocat:S4RLS4RL: Surprisingly Simple Self-Supervision for Offline Reinforcement Learning, Sinha S. et al (2021).
Offline
SPI-BBSafe Policy Improvement with Baseline Bootstrapping, Laroche R. et al (2019).AWACAWAC: Accelerating Online Reinforcement Learning with Offline Datasets, Nair A. et al (2020).CQLConservative Q-Learning for Offline Reinforcement Learning, Kumar A. et al. (2020).- Decision Transformer: Reinforcement Learning via Sequence Modeling, Chen L., Lu K. et al. (2021). :octocat:
- Reinforcement Learning as One Big Sequence Modeling Problem, Janner M., Li Q., Levine S. (2021).
Other
- Is the Bellman residual a bad proxy?, Geist M., Piot B., Pietquin O. (2016).
- Deep Reinforcement Learning that Matters, Henderson P. et al. (2017).
- Automatic Bridge Bidding Using Deep Reinforcement Learning, Yeh C. and Lin H. (2016).
- Shared Autonomy via Deep Reinforcement Learning, Reddy S. et al. (2018). 🎞️
- Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review, Levine S. (2018).
- The Value Function Polytope in Reinforcement Learning, Dadashi R. et al. (2019).
- On Value Functions and the Agent-Environment Boundary, Jiang N. (2019).
- How to Train Your Robot with Deep Reinforcement Learning; Lessons We've Learned, Ibartz J. et al (2021).
Learning from Demonstrations :mortar_board:
Imitation Learning
DAggerA Reduction of Imitation Learning and Structured Predictionto No-Regret Online Learning, Ross S., Gordon G., Bagnell J. A. (2011).QMDP-RCNNReinforcement Learning via Recurrent Convolutional Neural Networks, Shankar T. et al. (2016). (talk)DQfDLearning from Demonstrations for Real World Reinforcement Learning, Hester T. et al. (2017). 🎞️- Find Your Own Way: Weakly-Supervised Segmentation of Path Proposals for Urban Autonomy, Barnes D., Maddern W., Posner I. (2016). 🎞️
GAILGenerative Adversarial Imitation Learning, Ho J., Ermon S. (2016).- From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots, Pfeiffer M. et al. (2017). 🎞️
BranchedEnd-to-end Driving via Conditional Imitation Learning, Codevilla F. et al. (2017). 🎞️ | talkUPNUniversal Planning Networks, Srinivas A. et al. (2018). 🎞️DeepMimicDeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills, Peng X. B. et al. (2018). 🎞️R2P2Deep Imitative Models for Flexible Inference, Planning, and Control, Rhinehart N. et al. (2018). 🎞️- Learning Agile Robotic Locomotion Skills by Imitating Animals, Bin Peng X. et al (2020). 🎞️
- Deep Imitative Models for Flexible Inference, Planning, and Control, Rhinehart N., McAllister R., Levine S. (2020).
Applications to Autonomous Driving :car:
- ALVINN, an autonomous land vehicle in a neural network, Pomerleau D. (1989).
- End to End Learning for Self-Driving Cars, Bojarski M. et al. (2016). 🎞️
- End-to-end Learning of Driving Models from Large-scale Video Datasets, Xu H., Gao Y. et al. (2016). 🎞️
- End-to-End Deep Learning for Steering Autonomous Vehicles Considering Temporal Dependencies, Eraqi H. et al. (2017).
- Driving Like a Human: Imitation Learning for Path Planning using Convolutional Neural Networks, Rehder E. et al. (2017).
- Imitating Driver Behavior with Generative Adversarial Networks, Kuefler A. et al. (2017).
PS-GAILMulti-Agent Imitation Learning for Driving Simulation, Bhattacharyya R. et al. (2018). 🎞️ :octocat:- Deep Imitation Learning for Autonomous Driving in Generic Urban Scenarios with Enhanced Safety, Chen J. et al. (2019).
Inverse Reinforcement Learning
ProjectionApprenticeship learning via inverse reinforcement learning, Abbeel P., Ng A. (2004).MMPMaximum margin planning, Ratliff N. et al. (2006).BIRLBayesian inverse reinforcement learning, Ramachandran D., Amir E. (2007).MEIRLMaximum Entropy Inverse Reinforcement Learning, Ziebart B. et al. (2008).LEARCHLearning to search: Functional gradient techniques for imitation learning, Ratliff N., Siver D. Bagnell A. (2009).CIOCContinuous Inverse Optimal Control with Locally Optimal Examples, Levine S., Koltun V. (2012). 🎞️MEDIRLMaximum Entropy Deep Inverse Reinforcement Learning, Wulfmeier M. (2015).GCLGuided Cost Learning: Deep Inverse Optimal Control via Policy Optimization, Finn C. et al. (2016). 🎞️RIRLRepeated Inverse Reinforcement Learning, Amin K. et al. (2017).- Bridging the Gap Between Imitation Learning and Inverse Reinforcement Learning, Piot B. et al. (2017).
Applications to Autonomous Driving :taxi:
- Apprenticeship Learning for Motion Planning, with Application to Parking Lot Navigation, Abbeel P. et al. (2008).
- Navigate like a cabbie: Probabilistic reasoning from observed context-aware behavior, Ziebart B. et al. (2008).
- Planning-based Prediction for Pedestrians, Ziebart B. et al. (2009). 🎞️
- Learning for autonomous navigation, Bagnell A. et al. (2010).
- Learning Autonomous Driving Styles and Maneuvers from Expert Demonstration, Silver D. et al. (2012).
- Learning Driving Styles for Autonomous Vehicles from Demonstration, Kuderer M. et al. (2015).
- Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks, Sharifzadeh S. et al. (2016).
- Watch This: Scalable Cost-Function Learning for Path Planning in Urban Environments, Wulfmeier M. (2016). 🎞️
- Planning for Autonomous Cars that Leverage Effects on Human Actions, Sadigh D. et al. (2016).
- A Learning-Based Framework for Handling Dilemmas in Urban Automated Driving, Lee S., Seo S. (2017).
- Learning Trajectory Prediction with Continuous Inverse Optimal Control via Langevin Sampling of Energy-Based Models, Xu Y. et al. (2019).
- Analyzing the Suitability of Cost Functions for Explaining and Imitating Human Driving Behavior based on Inverse Reinforcement Learning, Naumann M. et al (2020).
Motion Planning :running_man:
Search
DijkstraA Note on Two Problems in Connexion with Graphs, Dijkstra E. W. (1959).A*A Formal Basis for the Heuristic Determination of Minimum Cost Paths , Hart P. et al. (1968).- Planning Long Dynamically-Feasible Maneuvers For Autonomous Vehicles, Likhachev M., Ferguson D. (2008).
- Optimal Trajectory Generation for Dynamic Street Scenarios in a Frenet Frame, Werling M., Kammel S. (2010). 🎞️
- 3D perception and planning for self-driving and cooperative automobiles, Stiller C., Ziegler J. (2012).
- Motion Planning under Uncertainty for On-Road Autonomous Driving, Xu W. et al. (2014).
- Monte Carlo Tree Search for Simulated Car Racing, Fischer J. et al. (2015). 🎞️
Sampling
RRT*Sampling-based Algorithms for Optimal Motion Planning, Karaman S., Frazzoli E. (2011). 🎞️LQG-MPLQG-MP: Optimized Path Planning for Robots with Motion Uncertainty and Imperfect State Information, van den Berg J. et al. (2010).- Motion Planning under Uncertainty using Differential Dynamic Programming in Belief Space, van den Berg J. et al. (2011).
- Rapidly-exploring Random Belief Trees for Motion Planning Under Uncertainty, Bry A., Roy N. (2011).
PRM-RLPRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning, Faust A. et al. (2017).
Optimization
- Trajectory planning for Bertha - A local, continuous method, Ziegler J. et al. (2014).
- Learning Attractor Landscapes for Learning Motor Primitives, Ijspeert A. et al. (2002).
- Online Motion Planning based on Nonlinear Model Predictive Control with Non-Euclidean Rotation Groups, Rösmann C. et al (2020). :octocat:
Reactive
PFReal-time obstacle avoidance for manipulators and mobile robots, Khatib O. (1986).VFHThe Vector Field Histogram - Fast Obstacle Avoidance For Mobile Robots, Borenstein J. (1991).VFH+VFH+: Reliable Obstacle Avoidance for Fast Mobile Robots, Ulrich I., Borenstein J. (1998).Velocity ObstaclesMotion planning in dynamic environments using velocity obstacles, Fiorini P., Shillert Z. (1998).
Architecture and applications
- A Review of Motion Planning Techniques for Automated Vehicles, González D. et al. (2016).
- A Survey of Motion Planning and Control Techniques for Self-driving Urban Vehicles, Paden B. et al. (2016).
- Autonomous driving in urban environments: Boss and the Urban Challenge, Urmson C. et al. (2008).
- The MIT-Cornell collision and why it happened, Fletcher L. et al. (2008).
- Making bertha drive-an autonomous journey on a historic route, Ziegler J. et al. (2014).