ai-safety topic

List ai-safety repositories

ethics

214
Stars
34
Forks
Watchers

Aligning AI With Shared Human Values (ICLR 2021)

awesome-machine-learning-interpretability

3.5k
Stars
578
Forks
Watchers

A curated list of awesome responsible machine learning resources.

giskard

3.3k
Stars
215
Forks
Watchers

🐢 Open-Source Evaluation & Testing for LLMs and ML models

FSSD_OoD_Detection

80
Stars
12
Forks
Watchers

Feature Space Singularity for Out-of-Distribution Detection. (SafeAI 2021)

FLAT

66
Stars
10
Forks
Watchers

[ICCV2021 Oral] Fooling LiDAR by Attacking GPS Trajectory

entropic-out-of-distribution-detection

75
Stars
10
Forks
Watchers

A project to add scalable state-of-the-art out-of-distribution detection (open set recognition) support by changing two lines of code! Perform efficient inferences (i.e., do not increase inference tim...

distinction-maximization-loss

45
Stars
5
Forks
Watchers

A project to improve out-of-distribution detection (open set recognition) and uncertainty estimation by changing a few lines of code in your project! Perform efficient inferences (i.e., do not increas...

awesome-ai-alignment

57
Stars
9
Forks
Watchers

A curated list of awesome resources for getting-started-with and staying-in-touch-with Artificial Intelligence Alignment research.

PromptInject

276
Stars
27
Forks
Watchers

PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...

Thought-Cloning

233
Stars
20
Forks
Watchers

[NeurIPS '23 Spotlight] Thought Cloning: Learning to Think while Acting by Imitating Human Thinking