adk-python icon indicating copy to clipboard operation
adk-python copied to clipboard

Fix: LiteLLM Streaming Content Duplication in Tool Call Responses

Open thesynapses opened this issue 1 month ago • 3 comments

Fix: LiteLLM Streaming Content Duplication in Tool Call Responses

Summary

Fixes content duplication in streaming responses when using LiteLLM models with ADK's planning features (e.g., PlanReActPlanner). Planning and reasoning text was appearing twice: once during streaming as individual chunks, and again in the aggregated tool-call message.

Resolves #3697

Problem

When the model generates planning/reasoning text (e.g., <PLANNING>I need to search...</PLANNING>) followed by tool calls during streaming:

  1. Text chunks are streamed to users in real-time (lines 1288-1296)
  2. Same text included again in aggregated tool-call message with content=text (line 1352)

This violates OpenAI/LiteLLM conventions where tool-call-only messages should have content=None.

Solution

Changed line 1352 to set content=None for tool-call messages:

ChatCompletionAssistantMessage(
    role="assistant",
    content=None,  # FIX: Avoid duplication, follow OpenAI spec
    tool_calls=tool_calls,
)

Why This Works

  • Planning text already streamed: Users see it in real-time via individual chunks
  • Preserved in thought_parts: Reasoning captured separately at line 1357 for conversation history
  • Follows API standards: OpenAI, Claude, GPT APIs expect content=None for tool-only messages
  • Correct semantics: Tool-call messages represent function invocations, not answer text

Impact

  • Eliminates content duplication
  • Aligns with OpenAI/LiteLLM conventions
  • Preserves reasoning context via thought_parts
  • Clean conversation history without redundant content
  • Proper semantic representation of tool-call turns

Testing

Tested with:

  • LiteLLM models (Claude, GPT) with planning workflows
  • Streaming enabled with tool calls
  • Multi-turn conversations requiring tool usage

Additional Context

  • Affects only streaming mode with tool calls
  • Non-streaming path already handles this correctly
  • thought_parts parameter preserves reasoning separately from message content
  • No breaking changes to existing APIs

thesynapses avatar Nov 24 '25 21:11 thesynapses

Summary of Changes

Hello @thesynapses, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves critical issues related to content handling and API compliance in LiteLLM streaming responses, especially when tool calls are involved. It refines how planning text and tool responses are processed during streaming to ensure a clean, accurate, and API-compliant conversation history, while also improving the agent runner's ability to interpret completion states from various models.

Highlights

  • Eliminate Content Duplication in Streaming Tool Call Responses: Previously, planning and reasoning text would appear twice in streaming responses when using LiteLLM models with ADK's planning features: once as individual chunks and again in the aggregated tool-call message. This fix ensures such text is streamed only once, providing a cleaner user experience.
  • Align Tool-Call Messages with OpenAI/LiteLLM API Conventions: Tool-call-only messages now correctly set content=None, adhering to API specifications. This prevents redundant content in conversation history and ensures proper semantic representation of tool-call turns.
  • Prevent Double-Serialization of Tool Responses: Addressed an issue where _safe_json_serialize would double-serialize already-JSON string tool responses, leading to malformed (triple-nested) JSON and parsing failures for models like Claude/GPT. The change now checks if the response is already a string before serialization.
  • Ensure Proper finish_reason Mapping for Streaming Responses: Implemented logic to correctly map the finish_reason to the FinishReason enum for aggregated LlmResponse objects in streaming mode. This ensures the ADK agent runner accurately recognizes completion states, mirroring behavior from the non-streaming path.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

gemini-code-assist[bot] avatar Nov 24 '25 21:11 gemini-code-assist[bot]

Summary of Changes

Hello @thesynapses, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and API compliance of LiteLLM integrations within ADK, particularly for streaming workflows involving tool calls. It resolves critical issues related to content duplication, incorrect serialization of tool responses, and inconsistent finish_reason handling, leading to cleaner conversation histories, more reliable tool execution, and accurate agent state management.

Highlights

  • Eliminate Content Duplication in Streaming Tool Calls: Resolved an issue where planning and reasoning text was duplicated in streaming responses when LiteLLM models made tool calls, appearing both as individual chunks and within the aggregated tool-call message. This is fixed by setting content=None for tool-only messages, aligning with OpenAI/LiteLLM API specifications.
  • Correct Tool Response Serialization: Addressed a bug where _safe_json_serialize was double-serializing JSON strings within function_response.response, leading to malformed, triple-nested JSON that prevented models like Claude/GPT from correctly parsing tool results. The change now checks if the response is already a string before serialization.
  • Consistent Finish Reason Mapping for Streaming: Implemented logic to correctly map the finish_reason to the FinishReason enum for aggregated LlmResponse objects in streaming scenarios (both tool-call and text-only responses). This ensures that the ADK agent runner can properly recognize completion states, mirroring behavior from non-streaming paths.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

gemini-code-assist[bot] avatar Nov 24 '25 21:11 gemini-code-assist[bot]

Hi @thesynapses , Thank you for your work on this pull request. We appreciate the effort you've invested. Before we can proceed with the review can you please fix the lint errors. You can use autoformat.sh.

ryanaiagent avatar Nov 30 '25 05:11 ryanaiagent