awesome-ai-safety
awesome-ai-safety copied to clipboard
A curated list of awesome AI safety papers, projects and communities.
Awesome AI Safety 
Table of Contents
-
Videos and Lectures
-
Papers
-
Researchers
-
Websites
-
Miscellaneous
-
Contributing
Videos and Lectures
- Concrete problems in AI safety By Robert Miles
- Online course on AI safety
- Safe Reinforcement Learning By Philip Thomas
- Safe Reinforcement Learning by Mohammad Ghavamzadeh
- Safe RL robotics by Felix Berkenkamp
- Safe Artificial Intelligence by Victoria Krakovna
Papers
- Scalable agent alignment via reward modeling: a research direction
- AGI safety literature review
- Concrete Problems in AI safety
- Preventing Side-effects in Gridworlds
- A Gym Gridworld Environment for the Treacherous Turn
- Preferences Implicit in the State of the World
- Conservative Agency via Attainable Utility Preservation
- Penalizing side effects using stepwise relative reachability
- Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
- Incorrigibility in the CIRL Framework
- The Off-Switch Game.
- Corrigibility
- Learning the Preferences of Ignorant, Inconsistent Agents
- Cooperative inverse reinforcement learning
- Towards Interactive Inverse Reinforcement Learning .
- Repeated Inverse Reinforcement Learning
- Should robots be obedient?
- Inverse Reward Design
- Learning Robust Rewards with Adversarial Inverse Reinforcement Learning
- Simplifying Reward Design through Divide-and-Conquer
- An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning
- Reward learning from human preferences and demonstrations in Atari
- Supervising strong learners by amplifying weak experts
- AI safety via debate
- Trial without Error: Towards Safe Reinforcement Learning via Human Intervention
- Deep reinforcement learning from human preferences
- Agent-Agnostic Human-in-the-Loop Reinforcement Learning
- Avoiding Wireheading with Value Reinforcement Learning
- Reinforcement learning with a corrupted reward channel
Safe Exploration
- Safe Exploration in Finite Markov Decision Processes with Gaussian Processes
- Safe Exploration for Interactive Machine Learning
- Stagewise Safe Bayesian Optimization with Gaussian Processes
- Safe Exploration in Continuous Action Spaces
- A lyapunov based approach to safe Reinforcement Learning
- Lyapunov based safe policy optimization for continuous control
- IPO: Interior-point Policy Optimization under Constraints
- CPO: Constrained policy optimization
Tutorials
Researchers
Websites
- https://80000hours.org/articles/ai-safety-syllabus/
- https://humancompatible.ai/bibliography
- http://aisafety.stanford.edu/
- https://intelligence.org/research/#publications
- https://ai-alignment.com/?gi=7c7707e4c512
- https://vkrakovna.wordpress.com/
- https://forum.effectivealtruism.org/
Blogs
Contributing
Have anything in mind that you think is awesome and would fit in this list? Feel free to send a pull request.
License
To the extent possible under law, Harshit Sikchi has waived all copyright and related or neighboring rights to this work.