llm-safety topic
List
llm-safety repositories
Hallucination-Attack
102
Stars
12
Forks
Watchers
Attack to induce LLMs within hallucinations
resta
25
Stars
1
Forks
Watchers
Restore safety in fine-tuned language models through task arithmetic
OpenRedTeaming
73
Stars
4
Forks
Watchers
Papers about red teaming LLMs and Multimodal models.
ALERT
30
Stars
7
Forks
Watchers
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
editing-attack
15
Stars
1
Forks
Watchers
Code and dataset for the paper: "Can Editing LLMs Inject Harm?"