ai-alignment topic
awesome-trustworthy-deep-learning
A curated list of trustworthy deep learning papers. Daily updating...
awesome-ai-alignment
A curated list of awesome resources for getting-started-with and staying-in-touch-with Artificial Intelligence Alignment research.
PromptInject
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Sa...
make-safe-ai
How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚
awesome-ai-safety
📚 A curated list of papers & technical articles on AI Quality & Safety
pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences
Sight-Beyond-Text
This repository includes the official implementation of our paper "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
adversarial-reinforcement-learning
Reading list for adversarial perspective and robustness in deep reinforcement learning.
sparse-probing-paper
Sparse probing paper full code.
aiwatch
Website to track people, organizations, and products (tools, websites, etc.) in AI safety