Feat/tau2 adk

Open heiko-hotz opened this issue 3 months ago • 0 comments

Description

This pull request introduces a new evaluation harness designed to bridge the gap between agents built with Google's Agent Development Kit (ADK) and the Tau2 Bench evaluation framework.

This initial version includes the core components needed for end-to-end evaluation:

Main Evaluation Runner (run_evaluation.py):

Orchestrates the conversational flow between the Tau2 User Simulator and the ADK Agent.
Dynamically loads ADK agents from a specified file path.
Injects the task-specific Tau2 domain policy into the ADK agent's instructions at runtime.

Tool Mapping & Translation Layer (harness/tool_mapper.py):

Provides a simple, extensible system for mapping tool names and arguments from the ADK agent's perspective to the Tau2 environment's implementation.
It intercepts FunctionCall events from the ADK agent, translates them, and executes the real tool within the Tau2 environment.

Sample ADK Agent (sample_adk_agent/):

A fully functional example agent for the airline domain is included.
This serves as a clear template for how to structure an ADK agent to be compatible with this harness.

Comprehensive Documentation (README.md):

A detailed README.md explains the project's purpose, architecture, and provides clear instructions for setup, usage, and extension to new domains.

Oct 01 '25 13:10 heiko-hotz