llm-safety topic
Hallucination-Attack
Attack to induce LLMs within hallucinations
resta
Restore safety in fine-tuned language models through task arithmetic
OpenRedTeaming
Papers about red teaming LLMs and Multimodal models.
ALERT
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
editing-attack
Code and dataset for the paper: "Can Editing LLMs Inject Harm?"
ai-testing-prompts
Comprehensive LLM testing suite for safety, performance, bias, and compliance, equipped with methodologies and tools to enhance the reliability and ethical integrity of models like OpenAI's GPT series...
deepteam
DeepTeam is a framework to red team LLMs and LLM systems.
JailbreakEval
[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.