FEAT: Add Multi-Agent Orchestrator Generic Multi-Agent Pipeline for Red Teaming
Thank you @eugeniavkim for working with me on the concept and design of this agentic multi-agent red teaming pipeline!
Please review when you have a chance. Feedback, suggestions for further modularity, or requests for sample agent system prompts are welcome :)
Overview
This draft PR introduces a new flexible Multi-Agent-System (MAS) pipeline for red teaming LLMs.
The MASChatTarget class enables composing any number of agents (e.g., recon, strategy, red-team, etc.) into an ordered chain, each with its own system prompt and context.
Key features:
- Generic
agent_chainarchitecture allows 2, 3, or more agents in any order. - Agents receive full conversation history plus optional per-role context.
- Currently no support for tool/function calls pipeline, it is prompt-based only.
- Example orchestration (in
use_msa_chat_target.py) demonstrates usage with strategy and red-team agent, and can be extended to include recon, or other agent roles.
Visualization of the MASChatTarget agent pipeline.
Note: The class name is now
MASChatTarget (previously MoAChatTarget).
Hello Roman, hey Eugenia :)
I need to check the memory in DuckDB. I’m a bit unsure about when adversarial_chat is considered an user and when it is considered an assistant in the context of the RedTeamingOrchestrator.
In the current setup, PyRIT is always the user and any target has the assistant role. There's one conversation with adversarial_chat, one with objective_target, and one with the scoring_target (assuming LLM scorers).
In your case this gets complicated because it's not just a single message and response but a chain. We may have to introduce a new role or just call all of the agents "assistant" and differentiate in a different way. Is it fair to assume that it's like this:
User -> agent 1 -> agent 2 -> ... -> agent n -> user
Or does user talk to agent 1, agent 1 to agent 2, etc and it's actually just n-1 separate conversations of 2 participants? In the latter case, it makes sense to call all of them separate conversations with user/assistant roles. In the former case I'm really not sure.
@rlundeen2 or @bashirpartovi may have thoughts.
In the current setup, PyRIT is always the user and any target has the assistant role. There's one conversation with adversarial_chat, one with objective_target, and one with the scoring_target (assuming LLM scorers).
In your case this gets complicated because it's not just a single message and response but a chain. We may have to introduce a new role or just call all of the agents "assistant" and differentiate in a different way. Is it fair to assume that it's like this:
User -> agent 1 -> agent 2 -> ... -> agent n -> user
Or does user talk to agent 1, agent 1 to agent 2, etc and it's actually just n-1 separate conversations of 2 participants? In the latter case, it makes sense to call all of them separate conversations with user/assistant roles. In the former case I'm really not sure.
@rlundeen2 or @bashirpartovi may have thoughts.
Thanks, Roman.
Right now, our MAS pipeline implements a single linear chain where user input is passed through each agent in sequence. For PyRIT compatibility, all agents are currently marked as "assistant" (except the initial and target responses, which are "user") when persisted to memory. We track the true MAS agent roles only in our internal _history, not in prompt_metadata. I could add each MAS agent’s role to prompt_metadata in the stored PromptRequestPiece. Would that be a good fit for auditability and downstream analysis?
Potentially. I am curious if @rlundeen2 has thoughts.
Hi @romanlutz @eugeniavkim @rlundeen2,
Quick note: Our multi-agent orchestrator currently only supports prompt-based chaining, no dynamic tools or data fetches yet (all logic is purely handled through prompt engineering and agent system YAMLs). If, say, a recon agent could trigger real actions (like a web search) and pass the results downstream, we’d get much more adaptive and realistic attack flows.
One idea: agent emits an action keyword, the orchestrator intercepts that, runs the tool, and injects the results back into the agent chain. Is this kind of dynamic action pipeline already on your radar, or do you have any early design thoughts? I’ll spend more time thinking about implementation options, but probably not until next week.
Update: Exploring dynamic tool use
Agents can emit an action keyword (e.g., "action": "web_search"), the orchestrator detects this, runs the corresponding tool (Python function), and injects results into the agent context for downstream agents. For quick PoCs (outside PyRIT), I usually call the OpenAI function-calling API for simple web search actions, or sometimes just use Python’s requests plus BeautifulSoup to scrape and parse web content and let the LLM process that. This works for fast demos, but for longer-term, production-grade features, using vendor APIs may be more robust.
If anyone has a preferred pattern for tool execution, or opinions on whether to always use LLM-native tool calls or custom Python logic, let me know :)
Current Approach: Related Issue and pointers from @romanlutz: #1006
I'm currently working on refactoring the multi-agent orchestrator to align with the new MultiTurnAttackStrategy interface. Expected completion: 29.09.2025