PyRIT icon indicating copy to clipboard operation
PyRIT copied to clipboard

Add GOAT orchestrator

Open Naominickels opened this issue 1 month ago • 1 comments

Is your feature request related to a problem? Please describe.

There is currently no orchestrator in PYRIT that is implementing the Generative Offensive Agent Tester (GOAT) attack to be used for better LLM security testing.

Describe the solution you'd like

I have already created a GOAT orchestrator (and also a system prompt extending GOAT's existing jailbreak strategies with additional ones e.g. to be used for the attacker in GOAT orchestrator) and would like to add this to the PYRIT repo.

Additional context

The work was done as a part of a project with Bosch so a request to obtain open-sourcing permission is submitted and currently being evaluated. It is expected to be accepted but it takes some time.

Naominickels avatar Oct 31 '25 11:10 Naominickels

@Naominickels this is awesome! We have rewritten the codebase in the last couple months so orchestrators are gone and we now use what's called "executors" but it should map to that fairly easily. Needless to say, feel free to ask questions. Happy to work with you on the PR and provide timely feedback whenever it's ready.

romanlutz avatar Oct 31 '25 22:10 romanlutz

Hey @Naominickels,

Does your template for the GOAT attack differ from the following?https://github.com/Azure/PyRIT/pull/930/commits/58828cc6146bf2fa9ce151275da3f731d1d0b9c6

See pyrit/datasets/orchestrators/red_teaming/multi_agent/goat_attacks.yaml. For example, we could extend it if needed. I’d be happy to get your feedback and also hear about the performance you achieved with your implementation.

The reason I’m asking is that I’m trying to figure out the best approach for attacking a target, something like a generic agent that only needs:

  • a strategy (based on the context of the target and its defenses), and

  • a toolbox of attacks (functions it can call, or just plain text which define techniques), such as:

    • Refusal Suppression
    • Dual Response
    • Response Priming

Each of these could be defined and selected from a .yaml file (or similar structure), instead of relying on a highly specialized orchestrator or executor. In other words, I’m exploring whether a more modular, strategy-driven, and configuration-based agent could work effectively.

@romanlutz, if you have any ideas, please feel free to jump in, I had to join the discussion because I was too curious. :)

KutalVolkan avatar Dec 03 '25 11:12 KutalVolkan

My bad, I haven't actually looked at GOAT in detail yet so I didn't realize it's multi-agent. I also hadn't seen your PR (we really should get to that... cc @rlundeen2).

romanlutz avatar Dec 04 '25 13:12 romanlutz

My bad, I haven't actually looked at GOAT in detail yet so I didn't realize it's multi-agent. I also hadn't seen your PR (we really should get to that... cc @rlundeen2).

No worries, GOAT is no multi-agent. I just wanted to give the strategy agent some attack strategies, and I took all the attack strategies from the GOAT paper.

Note: The orchestrator mentioned by @Naominickels is probably 100% according to the paper; no multi-agent system is involved in the original GOAT approach. Sorry for the confusion.

The intention was more about how GOAT with a multi-agent approach would differ in performance compared to the standard GOAT approach.

KutalVolkan avatar Dec 04 '25 14:12 KutalVolkan

OK I suppose we can evaluate that when we have the standard one 🙂

romanlutz avatar Dec 04 '25 22:12 romanlutz