ai-safety topic
ethics
Aligning AI With Shared Human Values (ICLR 2021)
awesome-machine-learning-interpretability
A curated list of awesome responsible machine learning resources.
giskard
🐢 Open-Source Evaluation & Testing for LLMs and ML models
FSSD_OoD_Detection
Feature Space Singularity for Out-of-Distribution Detection. (SafeAI 2021)
FLAT
[ICCV2021 Oral] Fooling LiDAR by Attacking GPS Trajectory
entropic-out-of-distribution-detection
A project to add scalable state-of-the-art out-of-distribution detection (open set recognition) support by changing two lines of code! Perform efficient inferences (i.e., do not increase inference tim...
distinction-maximization-loss
A project to improve out-of-distribution detection (open set recognition) and uncertainty estimation by changing a few lines of code in your project! Perform efficient inferences (i.e., do not increas...
awesome-ai-alignment
A curated list of awesome resources for getting-started-with and staying-in-touch-with Artificial Intelligence Alignment research.
PromptInject
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...
Thought-Cloning
[NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking