llm-safety-benchmark topic

List llm-safety-benchmark repositories

EasyJailbreak

440
Stars
39
Forks
Watchers

An easy-to-use Python framework to generate adversarial jailbreak prompts.

resta

25
Stars
1
Forks
Watchers

Restore safety in fine-tuned language models through task arithmetic

ALERT

30
Stars
7
Forks
Watchers

Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"