prompt-injection-defenses JailGuard: A Universal Detection Framework for LLM Prompt-based Attacks

JailGuard: A Universal Detection Framework for LLM Prompt-based Attacks

Open ramimac opened this issue 8 months ago • 0 comments

https://arxiv.org/pdf/2312.10766

we propose JailGuard, a universal detection framework for jailbreaking and hijacking attacks across LLMs and MLLMs. JailGuard operates on the principle that attacks are inherently less robust than benign ones, regardless of method or modality. Specifically, JailGuard mutates untrusted inputs to generate variants and leverages discrepancy of the variants’ responses on the model to distinguish attack samples from benign samples

Jun 19 '24 12:06 ramimac

prompt-injection-defenses prompt-injection-defenses copied to clipboard

JailGuard: A Universal Detection Framework for LLM Prompt-based Attacks

prompt-injection-defenses
prompt-injection-defenses copied to clipboard