ai-alignment topic

List ai-alignment repositories

awesome-trustworthy-deep-learning

291
Stars
32
Forks
Watchers

A curated list of trustworthy deep learning papers. Daily updating...

awesome-ai-alignment

57
Stars
9
Forks
Watchers

A curated list of awesome resources for getting-started-with and staying-in-touch-with Artificial Intelligence Alignment research.

PromptInject

276
Stars
27
Forks
Watchers

PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...

make-safe-ai

169
Stars
7
Forks
Watchers

How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚

awesome-ai-safety

140
Stars
11
Forks
Watchers

📚 A curated list of papers & technical articles on AI Quality & Safety

pretraining-with-human-feedback

167
Stars
14
Forks
Watchers

Code accompanying the paper Pretraining Language Models with Human Preferences

Sight-Beyond-Text

19
Stars
1
Forks
Watchers

This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"

adversarial-reinforcement-learning

77
Stars
5
Forks
Watchers

Reading list for adversarial perspective and robustness in deep reinforcement learning.

aiwatch

20
Stars
6
Forks
Watchers

Website to track people, organizations, and products (tools, websites, etc.) in AI safety