llm-safety topic

List llm-safety repositories

Hallucination-Attack

102
Stars
12
Forks
Watchers

Attack to induce LLMs within hallucinations

resta

25
Stars
1
Forks
Watchers

Restore safety in fine-tuned language models through task arithmetic

OpenRedTeaming

73
Stars
4
Forks
Watchers

Papers about red teaming LLMs and Multimodal models.

ALERT

30
Stars
7
Forks
Watchers

Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"

editing-attack

15
Stars
1
Forks
Watchers

Code and dataset for the paper: "Can Editing LLMs Inject Harm?"