PyRIT
PyRIT copied to clipboard
FEAT: Add anecdoctor orchestrator to build attack prompts from real-world examples.
Description
Adding a new Orchestrator that constructs attack prompts based on real-world examples. This orchestrator performs best for informational harms in that center on a consistent narrative (i.e. via clustering of a larger attack dataset). There are two options:
- baseline approach simply constructs prompts from few-shot examples
- use_knowledge_graph=True uses a processing target to construct a knowledge graph. This produces attack prompts that are more robust to cultural and linguistic differences in example data.
This contribution follows on collaborations with @eugeniavkim.
Tests and Documentation
Tests are included for the orchestrator. We also include documentation (jupytext) with a toy dataset as well as a link to real-world data for which the method works well. A paper validating the method is in revision and will be posted to ArXiv in the near future.