Awesome AI Safety

License

Figuring out how to make your AI safer? How to avoid ethical biases, errors, privacy leaks or robustness issues in your AI models?

This repository contains a curated list of papers & technical articles on AI Quality & Safety that should help 📚

You can browse papers by Machine Learning task category, and use hashtags like #robustness to explore AI risk types.

General ML Testing
Tabular Machine Learning
Natural Language Processing
Computer Vision
Recommendation System
Time Series

General ML Testing

Machine learning testing: Survey, landscapes and horizons (Zhang et al., 2020) #General
Quality Assurance for AI-based Systems: Overview and Challenges (Felderer et al., 2021) #General
The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction (Breck et al., 2017) #General
Reliable Machine Learning: Applying SRE Principles to ML in Production [BOOK] (Chen et al., 2022) #Reliability
Metamorphic testing of decision support systems: A case study (Kuo et al., 2010) #Robustness
A Survey on Metamorphic Testing (Segura et al., 2016) #Robustness
Testing and validating machine learning classifiers by metamorphic testing (Xie et al., 2011) #Robustness
The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective (Krishna et al., 2022) #Explainability
InterpretML: A Unified Framework for Machine Learning Interpretability (Nori et al., 2019) #Explainability #General
Fair regression: Quantitative definitions and reduction-based algorithms (Agarwal et al., 2019) #Fairness
Learning Optimal and Fair Decision Trees for Non-Discriminative Decision-Making (Aghaei et al., 2019) #Fairness
Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning (Henderson et al., 2020) #Environment

AI Incident Databases

AI Incident Database (Responsible AI Collaborative)
AI Vulnerability Database (AVID)

Tabular Machine Learning

Machine Learning Model Drift Detection Via Weak Data Slices (Ackerman et al., 2021) #DataSlice #Debugging #Drift
Automated Data Slicing for Model Validation: A Big Data - AI Integration Approach (Chung et al., 2020) #DataSlice
Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models (Krause et al., 2016) #Explainability

Natural Language Processing

Beyond Accuracy: Behavioral Testing of NLP Models with CheckList (Ribeiro et al., 2020) #Robustness
Pipelines for Social Bias Testing of Large Language Models (Nozza et al., 2022) #Bias #Ethics
Why Should I Trust You?": Explaining the Predictions of Any Classifier (Ribeiro et al., 2016) #Explainability
A Unified Approach to Interpreting Model Predictions (Lundberg et al., 2017) #Explainability
Anchors: High-Precision Model-Agnostic Explanations (Ribeiro et al., 2018) #Explainability
Explanation-Based Human Debugging of NLP Models: A Survey (Lertvittayakumjorn, et al., 2021) #Debugging
SEAL: Interactive Tool for Systematic Error Analysis and Labeling (Rajani et al., 2022) #DataSlice #Explainability
Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science (Bender and Friedman, 2018) #Bias
Equalizing Gender Biases in Neural Machine Translation with Word Embeddings Techniques (Font and Costa-jussà, 2019) #Bias
On Measuring Social Biases in Sentence Encoders (May et al., 2019) #Bias
BBQ: A Hand-Built Bias Benchmark for Question Answering (Parrish et al., 2022) #Bias
What Do You See in this Patient? Behavioral Testing of Clinical NLP Models (Van Aken et al., 2021) #Bias

Large Language Models

Holistic Evaluation of Language Models (Liang et al., 2022) #General
Learning to summarize from human feedback (Stiennon et al., 2020) #HumanFeedback
Identifying and Reducing Gender Bias in Word-Level Language Models (Bordia and Bowman, 2019) #Bias

Computer Vision

DOMINO: Discovering Systematic Errors with Cross-modal Embeddings Domino (Eyuboglu et al., 2022) #DataSlice
Explaining in Style: Training a GAN to explain a classifier in StyleSpace (Lang et al., 2022) #Robustness
Model Assertions for Debugging Machine Learning (Kang et al., 2018) #Debugging
Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure (Amini et al.) #Bias
Diversity in Faces (Merler et al.) #Fairness #Accuracy

Recommendation System

Beyond NDCG: behavioral testing of recommender systems with RecList (Chia et al., 2021) #Robustness

Time Series

Contributions are welcome 💕

awesome-ai-safety
awesome-ai-safety copied to clipboard

Metadata

Awesome AI Safety

Table of Contents

General ML Testing

AI Incident Databases

Tabular Machine Learning

Natural Language Processing

Large Language Models

Computer Vision

Recommendation System

Time Series

← Metadata

Owner

Metadata

awesome-ai-safety awesome-ai-safety copied to clipboard

Metadata

Awesome AI Safety

Table of Contents

General ML Testing

AI Incident Databases

Tabular Machine Learning

Natural Language Processing

Large Language Models

Computer Vision

Recommendation System

Time Series

← Metadata

Owner

Metadata

awesome-ai-safety
awesome-ai-safety copied to clipboard