llm-safety topic

List llm-safety repositories

Hallucination-Attack

102
Stars
12
Forks
Watchers

Attack to induce LLMs within hallucinations

resta

25
Stars
1
Forks
Watchers

Restore safety in fine-tuned language models through task arithmetic

OpenRedTeaming

73
Stars
4
Forks
Watchers

Papers about red teaming LLMs and Multimodal models.

ALERT

30
Stars
7
Forks
Watchers

Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"

editing-attack

15
Stars
1
Forks
Watchers

Code and dataset for the paper: "Can Editing LLMs Inject Harm?"

ai-testing-prompts

19
Stars
0
Forks
19
Watchers

Comprehensive LLM testing suite for safety, performance, bias, and compliance, equipped with methodologies and tools to enhance the reliability and ethical integrity of models like OpenAI's GPT series...

deepteam

1.0k
Stars
148
Forks
1.0k
Watchers

DeepTeam is a framework to red team LLMs and LLM systems.

JailbreakEval

172
Stars
11
Forks
172
Watchers

[NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.