awesome-ai-safety
awesome-ai-safety copied to clipboard
π A curated list of papers & technical articles on AI Quality & Safety
Awesome AI Safety ![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)
Figuring out how to make your AI safer? How to avoid ethical biases, errors, privacy leaks or robustness issues in your AI models?
This repository contains a curated list of papers & technical articles on AI Quality & Safety that should help π
Table of Contents
You can browse papers by Machine Learning task category, and use hashtags like #robustness
to explore AI risk types.
- General ML Testing
- Tabular Machine Learning
- Natural Language Processing
- Computer Vision
- Recommendation System
- Time Series
General ML Testing
-
Machine learning testing: Survey, landscapes and horizons (Zhang et al., 2020)
#General
-
Quality Assurance for AI-based Systems: Overview and Challenges (Felderer et al., 2021)
#General
-
The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction (Breck et al., 2017)
#General
-
Reliable Machine Learning: Applying SRE Principles to ML in Production [BOOK] (Chen et al., 2022)
#Reliability
-
Metamorphic testing of decision support systems: A case study (Kuo et al., 2010)
#Robustness
-
A Survey on Metamorphic Testing (Segura et al., 2016)
#Robustness
-
Testing and validating machine learning classifiers by metamorphic testing (Xie et al., 2011)
#Robustness
-
The Disagreement Problem in Explainable Machine Learning: A Practitionerβs Perspective (Krishna et al., 2022)
#Explainability
-
InterpretML: A Unified Framework for Machine Learning Interpretability (Nori et al., 2019)
#Explainability
#General
-
Fair regression: Quantitative definitions and reduction-based algorithms (Agarwal et al., 2019)
#Fairness
-
Learning Optimal and Fair Decision Trees for Non-Discriminative Decision-Making (Aghaei et al., 2019)
#Fairness
-
Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning (Henderson et al., 2020)
#Environment
AI Incident Databases
- AI Incident Database (Responsible AI Collaborative)
- AI Vulnerability Database (AVID)
Tabular Machine Learning
-
Machine Learning Model Drift Detection Via Weak Data Slices (Ackerman et al., 2021)
#DataSlice
#Debugging
#Drift
-
Automated Data Slicing for Model Validation: A Big Data - AI Integration Approach (Chung et al., 2020)
#DataSlice
-
Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models (Krause et al., 2016)
#Explainability
Natural Language Processing
-
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList (Ribeiro et al., 2020)
#Robustness
-
Pipelines for Social Bias Testing of Large Language Models (Nozza et al., 2022)
#Bias
#Ethics
-
Why Should I Trust You?": Explaining the Predictions of Any Classifier (Ribeiro et al., 2016)
#Explainability
-
A Unified Approach to Interpreting Model Predictions (Lundberg et al., 2017)
#Explainability
-
Anchors: High-Precision Model-Agnostic Explanations (Ribeiro et al., 2018)
#Explainability
-
Explanation-Based Human Debugging of NLP Models: A Survey (Lertvittayakumjorn, et al., 2021)
#Debugging
-
SEAL: Interactive Tool for Systematic Error Analysis and Labeling (Rajani et al., 2022)
#DataSlice
#Explainability
-
Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science (Bender and Friedman, 2018)
#Bias
-
Equalizing Gender Biases in Neural Machine Translation with Word Embeddings Techniques (Font and Costa-jussΓ , 2019)
#Bias
-
On Measuring Social Biases in Sentence Encoders (May et al., 2019)
#Bias
-
BBQ: A Hand-Built Bias Benchmark for Question Answering (Parrish et al., 2022)
#Bias
-
What Do You See in this Patient? Behavioral Testing of Clinical NLP Models (Van Aken et al., 2021)
#Bias
Large Language Models
-
Holistic Evaluation of Language Models (Liang et al., 2022)
#General
-
Learning to summarize from human feedback (Stiennon et al., 2020)
#HumanFeedback
-
Identifying and Reducing Gender Bias in Word-Level Language Models (Bordia and Bowman, 2019)
#Bias
Computer Vision
-
DOMINO: Discovering Systematic Errors with Cross-modal Embeddings Domino (Eyuboglu et al., 2022)
#DataSlice
-
Explaining in Style: Training a GAN to explain a classifier in StyleSpace (Lang et al., 2022)
#Robustness
-
Model Assertions for Debugging Machine Learning (Kang et al., 2018)
#Debugging
-
Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure (Amini et al.)
#Bias
-
Diversity in Faces (Merler et al.)
#Fairness
#Accuracy
Recommendation System
-
Beyond NDCG: behavioral testing of recommender systems with RecList (Chia et al., 2021)
#Robustness
Time Series
Contributions are welcome π