PyRIT icon indicating copy to clipboard operation
PyRIT copied to clipboard

FEAT Single turn crescendo

Open romanlutz opened this issue 1 year ago • 3 comments

Is your feature request related to a problem? Please describe.

We don't support single turn crescendo yet. This should be added.

Paper: https://arxiv.org/pdf/2409.03131v1

GitHub repo (results only): https://github.com/alanaqrawi/STCA

Describe the solution you'd like

The tricky part is that for every goal/objective (e.g., "how to create a molotov cocktail") the conversation looks very different. We'll need to be able to generate the entire conversation until the n-th step and then put that into a single prompt. The assumption here has to be that the attack target is a single turn target (otherwise we can just use "normal" crescendo). So the red teaming LLM has to generate both sides of the conversation. An alternative (mentioned by Alan, the author of the paper), is to run full Crescendo and keep the questions and responses, then put them in a single prompt. That may or may not be possible in actual operations (and definitely not with single turn targets).

Importantly, the n should be configurable. The paper has some discussion of that and we probably want to be flexible.

The final solution needs to have tests and a simple notebook (like all orchestrators). There's some freedom in terms of how to do this that depends on how the conversation generation works best:

  • custom orchestrator: first generate conversation (which may be a single step or multiple), then send to target
  • converter: converter generates conversation from just the goal [It's not a typical converter, though....]
  • another way? If this is chosen please discuss here with dev team first.

Describe alternatives you've considered, if relevant

Alternatively, one could pregenerate such single turn crescendo templates for hundreds of goals, but that will never be comprehensive...

Additional context

One tricky aspect is that the responses need to be somewhat similar to how the target model responds. Otherwise, it may get "suspicious" (not trying to anthropomorphize here but it's the simplest way to explain what I mean) and refuse to comply.

romanlutz avatar Sep 20 '24 21:09 romanlutz

Hey! I'm up for it!

roeybc avatar Sep 20 '24 21:09 roeybc

Let me know if you have questions folks

alanaqrawi avatar Sep 20 '24 22:09 alanaqrawi

Unassigning this to make it available for anyone who might be interested. This is a fun one, grab it will it's there 😆

romanlutz avatar Mar 10 '25 06:03 romanlutz