feat: Refactor BQ Analytics Plugin to use Structured JSON
Please ensure you have read the contribution guide before creating a pull request.
Link to Issue or Description of Change
1. Link to an existing issue (if applicable):
- Closes: #3724
2. Or, if no issue exists, describe the change:
Problem:
The current BigQueryAgentAnalyticsPlugin implementation stores complex event payloads (such as LLM requests, responses, and tool calls) as unstructured, concatenated strings using pipe delimiters. This makes downstream analysis difficult (requiring complex Regex), prevents efficient querying of nested fields (like token usage), and relies on a hard character limit for truncation that can result in data loss or broken formatting.
Solution:
This PR refactors the plugin to leverage BigQuery's native JSON data type for the content column.
- Structured Storage: Schema updated to store
contentasJSON. - Smart Truncation: Implemented
_recursive_smart_truncateto safely truncate long string values within the payload without breaking the JSON structure. - Payload Mutation: Updated the
content_formatterlogic to accept and return adict. This allows users to programmatically redact PII or prune fields before serialization.
Testing Plan
Please describe the tests that you ran to verify your changes. This is required for all PRs that are not small documentation or typo fixes.
Unit Tests:
- [x] I have added or updated unit tests for my change.
- [x] All unit tests pass locally.
Please include a summary of passed pytest results.
tests/unittests/plugins/test_bigquery_agent_analytics_plugin.py ...........................
======================== 27 passed, 1 warning in 4.25s =========================
Manual End-to-End (E2E) Tests:
Please provide instructions on how to manually test your changes, including any necessary setup or configuration. Please provide logs or screenshots to help reviewers better understand the fix.
- Setup: Configure the plugin with a Google Cloud Project and Dataset.
- Run: Execute an agent that uses tools and generates long text responses.
- Verify: Check the BigQuery table schema to confirm the
contentcolumn isJSON. Query the table usingJSON_VALUE(content, '$.tool_name')to verify structured access works.
Checklist
- [x] I have read the CONTRIBUTING.md document.
- [x] I have performed a self-review of my own code.
- [x] I have commented my code, particularly in hard-to-understand areas.
- [x] I have added tests that prove my fix is effective or that my feature works.
- [x] New and existing unit tests pass locally with my changes.
- [x] I have manually tested my changes end-to-end.
- [x] Any dependent changes have been merged and published in downstream modules.
Additional context
⚠️ Breaking Schema Change:
This update changes the BigQuery table schema. Users with existing tables created by previous versions of this plugin will need to delete the table (to allow the plugin to recreate it) or manually migrate the content column from STRING to JSON.
Summary of Changes
Hello @amenegola, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly refactors the BigQuery Analytics Plugin to improve the storage and analysis of agent event data. By transitioning the content column to a native BigQuery JSON type, it moves away from unstructured, pipe-delimited strings, enabling more efficient and flexible querying of nested data. The changes also introduce intelligent truncation logic to maintain data integrity and enhance the custom content formatting capabilities, providing users with better control over their logged data.
Highlights
- Structured Content Storage: The
contentcolumn in the BigQuery table schema has been changed fromSTRINGtoJSON, allowing for structured storage and easier querying of complex event payloads like LLM requests, responses, and tool calls. - Smart Truncation: A new recursive truncation mechanism (
_recursive_smart_truncate) is implemented to safely truncate long string values within nested JSON objects without breaking the JSON structure. This prevents data loss and malformed entries, and themax_content_lengthdefault is increased to 50KB. - Enhanced Content Formatting: The
content_formatternow accepts and returns a dictionary, enabling programmatic PII redaction or field pruning before JSON serialization, offering greater control and flexibility over logged data. Theis_truncatedcolumn has been removed as smart truncation ensures valid JSON. - Refactored Callbacks: All event callback methods have been updated to construct and pass structured dictionaries as content payloads, streamlining the logging process and leveraging the new JSON storage capabilities.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.
/gemini review
/gemini review
/gemini review
/gemini review
/gemini review
/gemini review
/gemini review
/gemini review
/gemini review
/gemini review
/gemini review
/gemini review
Hi @amenegola , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Your PR has been received by the team and is currently under review. We will provide feedback as soon as we have an update to share.
Hi @wyf7107 , can you please review this.