llm-safety topics

Hallucination-Attack

102

Stars

12

Forks

Watchers

Attack to induce LLMs within hallucinations

PKU-YuanGroup

adversarial-attacks

ai-safety

deep-learning

hallucinations

resta

25

Stars

1

Forks

Watchers

Restore safety in fine-tuned language models through task arithmetic

declare-lab

alignment

alignment-algorithm

llm

llm-safety

OpenRedTeaming

73

Stars

4

Forks

Watchers

Papers about red teaming LLMs and Multimodal models.

Libr-AI

awesome-list

language-model

llm-safety

safety

ALERT

30

Stars

7

Forks

Watchers

Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"

Babelscape

ai

artificial-intelligence

benchmark

bias-detection

editing-attack

15

Stars

1

Forks

Watchers

Code and dataset for the paper: "Can Editing LLMs Inject Harm?"

llm-editing

knowledge-editing

llm-safety

llms