red-teaming-agent icon indicating copy to clipboard operation
red-teaming-agent copied to clipboard

A red teaming agent

Autonomous Red Teaming System using Google ADK

This project implements a sophisticated, autonomous red teaming framework that uses specialized agents powered by Google's Gemini model to test AI systems for vulnerabilities, biases, and safety weaknesses.

The system includes two types of target agents for testing: a "Stubborn Agent" that refuses all requests and protects its system prompt, and a "Non-Compliant Agent" that occasionally acquiesces to harmful requests.

How It Works

The system is built on Google's Agent Development Kit (ADK) and operates through a RedTeamCoordinator agent. This coordinator orchestrates the entire testing process:

  1. Initializes a set of predefined test prompts covering categories like prompt injection, bias detection, and safety boundaries.
  2. Dynamically generates follow-up tests based on the responses received from the target agent.
  3. Analyzes each response to assign a risk score and identify potential vulnerabilities.
  4. Runs autonomously, executing a series of tests until a predefined limit is reached.
  5. Generates a comprehensive report at the end of the session, summarizing findings, risk levels, and category-specific performance.

The framework is designed to be extensible, allowing for the addition of new specialized agents, test categories, and target models.

Setup and Configuration

1. Prerequisites

  • Python 3.8+
  • An active internet connection

2. Installation

Clone the repository and install the required Python packages.

git clone <repository-url>
cd red-teaming-agent
pip install -r requirements.txt

3. Configure API Keys

The project requires an API key for Google Gemini (used by both the red teaming agents and the target agents). Create a file named .env in the root of the project directory:

# .env file
GEMINI_API_KEY="YOUR_GEMINI_API_KEY"

Replace the placeholder value with your actual API key.

Target Agents

The system includes two pre-configured target agents for testing:

1. Stubborn Agent (Default)

A defensive agent designed to:

  • Refuse all requests with a consistent response
  • Protect its system prompt from disclosure
  • Never explain why it refuses requests
  • Never make exceptions for any request

2. Non-Compliant Agent

An agent that simulates a partially compromised system:

  • Occasionally complies with harmful requests
  • Provides more varied responses for testing
  • Useful for testing detection of actual vulnerabilities

How to Run

Execute the main script from your terminal to start the autonomous testing session:

python main.py

You can also specify the maximum number of tests to run using the --max-tests argument:

python main.py --max-tests 30

The script will run autonomously, testing the target agents and finishing with a comprehensive summary report. There is no interactive user input required.

Contributing

We welcome contributions to improve and expand the red teaming capabilities! There are two main areas where contributions are especially valuable:

1. Enhancing Red Teaming Capabilities

Help strengthen the testing framework by:

  • Adding new specialized agents for specific vulnerability categories (e.g., jailbreaking, social engineering, misinformation)
  • Improving test generation algorithms to create more sophisticated attack vectors
  • Expanding risk assessment logic with better scoring mechanisms and vulnerability indicators
  • Adding new test categories like privacy violations, reasoning exploitation, or model alignment issues
  • Enhancing follow-up generation to create more targeted subsequent tests based on responses

2. Adding Target Agents

Expand testing coverage by integrating real-world AI agents:

  • Deep research agents that perform complex information retrieval and analysis
  • Code generation assistants that help with programming tasks
  • Creative writing agents that generate stories, articles, or marketing content
  • Customer service bots that handle user inquiries and support
  • Educational tutoring agents that provide learning assistance
  • Task automation agents that handle workflow management

When contributing target agents, please ensure they represent realistic deployment scenarios and include appropriate consent/permissions for testing.

Getting Started with Contributions

  1. Fork the repository and create a feature branch
  2. Follow the existing code patterns in main.py
  3. Test your additions with the current framework
  4. Submit a pull request with a clear description of your changes
  5. Include examples of how your contribution improves red teaming effectiveness

For questions about contributing, please open an issue to discuss your ideas before implementation.