llm-safety-benchmark topic
List
llm-safety-benchmark repositories
EasyJailbreak
440
Stars
39
Forks
Watchers
An easy-to-use Python framework to generate adversarial jailbreak prompts.
resta
25
Stars
1
Forks
Watchers
Restore safety in fine-tuned language models through task arithmetic
ALERT
30
Stars
7
Forks
Watchers
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"