llm-safety topic
List
llm-safety repositories
Hallucination-Attack
84
Stars
11
Forks
Watchers
Attack to induce LLMs within hallucinations
resta
20
Stars
1
Forks
Watchers
Restore safety in fine-tuned language models through task arithmetic
OpenRedTeaming
37
Stars
2
Forks
Watchers
Papers about red teaming LLMs and Multimodal models.