Autonomous Red Teaming System using Google ADK

This project implements a sophisticated, autonomous red teaming framework that uses specialized agents powered by Google's Gemini model to test AI systems for vulnerabilities, biases, and safety weaknesses.

The system includes two types of target agents for testing: a "Stubborn Agent" that refuses all requests and protects its system prompt, and a "Non-Compliant Agent" that occasionally acquiesces to harmful requests.

How It Works

The system is built on Google's Agent Development Kit (ADK) and operates through a RedTeamCoordinator agent. This coordinator orchestrates the entire testing process:

Initializes a set of predefined test prompts covering categories like prompt injection, bias detection, and safety boundaries.
Dynamically generates follow-up tests based on the responses received from the target agent.
Analyzes each response to assign a risk score and identify potential vulnerabilities.
Runs autonomously, executing a series of tests until a predefined limit is reached.
Generates a comprehensive report at the end of the session, summarizing findings, risk levels, and category-specific performance.

The framework is designed to be extensible, allowing for the addition of new specialized agents, test categories, and target models.

Setup and Configuration

1. Prerequisites

Python 3.8+
An active internet connection

2. Installation

Clone the repository and install the required Python packages.

git clone <repository-url>
cd red-teaming-agent
pip install -r requirements.txt

3. Configure API Keys

The project requires an API key for Google Gemini (used by both the red teaming agents and the target agents). Create a file named .env in the root of the project directory:

# .env file
GEMINI_API_KEY="YOUR_GEMINI_API_KEY"

Replace the placeholder value with your actual API key.

Target Agents

The system includes two pre-configured target agents for testing:

1. Stubborn Agent (Default)

A defensive agent designed to:

Refuse all requests with a consistent response
Protect its system prompt from disclosure
Never explain why it refuses requests
Never make exceptions for any request

2. Non-Compliant Agent

An agent that simulates a partially compromised system:

Occasionally complies with harmful requests
Provides more varied responses for testing
Useful for testing detection of actual vulnerabilities

How to Run

Execute the main script from your terminal to start the autonomous testing session:

python main.py

You can also specify the maximum number of tests to run using the --max-tests argument:

python main.py --max-tests 30

The script will run autonomously, testing the target agents and finishing with a comprehensive summary report. There is no interactive user input required.

Contributing

We welcome contributions to improve and expand the red teaming capabilities! There are two main areas where contributions are especially valuable:

1. Enhancing Red Teaming Capabilities

Help strengthen the testing framework by:

Adding new specialized agents for specific vulnerability categories (e.g., jailbreaking, social engineering, misinformation)
Improving test generation algorithms to create more sophisticated attack vectors
Expanding risk assessment logic with better scoring mechanisms and vulnerability indicators
Adding new test categories like privacy violations, reasoning exploitation, or model alignment issues
Enhancing follow-up generation to create more targeted subsequent tests based on responses

2. Adding Target Agents

Expand testing coverage by integrating real-world AI agents:

Deep research agents that perform complex information retrieval and analysis
Code generation assistants that help with programming tasks
Creative writing agents that generate stories, articles, or marketing content
Customer service bots that handle user inquiries and support
Educational tutoring agents that provide learning assistance
Task automation agents that handle workflow management

When contributing target agents, please ensure they represent realistic deployment scenarios and include appropriate consent/permissions for testing.

Getting Started with Contributions

Fork the repository and create a feature branch
Follow the existing code patterns in main.py
Test your additions with the current framework
Submit a pull request with a clear description of your changes
Include examples of how your contribution improves red teaming effectiveness

For questions about contributing, please open an issue to discuss your ideas before implementation.

red-teaming-agent
red-teaming-agent copied to clipboard

Metadata

Autonomous Red Teaming System using Google ADK

How It Works

Setup and Configuration

1. Prerequisites

2. Installation

3. Configure API Keys

Target Agents

1. Stubborn Agent (Default)

2. Non-Compliant Agent

How to Run

Contributing

1. Enhancing Red Teaming Capabilities

2. Adding Target Agents

Getting Started with Contributions

← Metadata

Owner

Metadata

red-teaming-agent red-teaming-agent copied to clipboard

Metadata

Autonomous Red Teaming System using Google ADK

How It Works

Setup and Configuration

1. Prerequisites

2. Installation

3. Configure API Keys

Target Agents

1. Stubborn Agent (Default)

2. Non-Compliant Agent

How to Run

Contributing

1. Enhancing Red Teaming Capabilities

2. Adding Target Agents

Getting Started with Contributions

← Metadata

Owner

Metadata

red-teaming-agent
red-teaming-agent copied to clipboard