garak icon indicating copy to clipboard operation
garak copied to clipboard

probe: persuasive jailbreak

Open leondz opened this issue 1 year ago • 4 comments

Add persuasion-based attacks

Paper: How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

Page: https://chats-lab.github.io/persuasive_jailbreaker/

Code: https://github.com/CHATS-lab/persuasive_jailbreaker?tab=readme-ov-file

leondz avatar May 14 '24 10:05 leondz

This issue has been automatically marked as stale because it has not had recent activity. If you are still interested in this issue, please respond to keep it open. Thank you!

github-actions[bot] avatar Jul 17 '25 00:07 github-actions[bot]

Hi @leondz, I'd love to pick up this issue if it is still available!

asaadkhaja99 avatar Oct 28 '25 15:10 asaadkhaja99

OK, yes please, thanks! Reach out if you have questions

leondz avatar Oct 28 '25 16:10 leondz

Great, thanks!

asaadkhaja99 avatar Oct 28 '25 16:10 asaadkhaja99