Fix: LiteLLM Streaming Content Duplication in Tool Call Responses
Fix: LiteLLM Streaming Content Duplication in Tool Call Responses
Summary
Fixes content duplication in streaming responses when using LiteLLM models with ADK's planning features (e.g., PlanReActPlanner). Planning and reasoning text was appearing twice: once during streaming as individual chunks, and again in the aggregated tool-call message.
Resolves #3697
Problem
When the model generates planning/reasoning text (e.g., <PLANNING>I need to search...</PLANNING>) followed by tool calls during streaming:
- Text chunks are streamed to users in real-time (lines 1288-1296)
- Same text included again in aggregated tool-call message with
content=text(line 1352)
This violates OpenAI/LiteLLM conventions where tool-call-only messages should have content=None.
Solution
Changed line 1352 to set content=None for tool-call messages:
ChatCompletionAssistantMessage(
role="assistant",
content=None, # FIX: Avoid duplication, follow OpenAI spec
tool_calls=tool_calls,
)
Why This Works
- Planning text already streamed: Users see it in real-time via individual chunks
- Preserved in
thought_parts: Reasoning captured separately at line 1357 for conversation history - Follows API standards: OpenAI, Claude, GPT APIs expect
content=Nonefor tool-only messages - Correct semantics: Tool-call messages represent function invocations, not answer text
Impact
- Eliminates content duplication
- Aligns with OpenAI/LiteLLM conventions
- Preserves reasoning context via
thought_parts - Clean conversation history without redundant content
- Proper semantic representation of tool-call turns
Testing
Tested with:
- LiteLLM models (Claude, GPT) with planning workflows
- Streaming enabled with tool calls
- Multi-turn conversations requiring tool usage
Additional Context
- Affects only streaming mode with tool calls
- Non-streaming path already handles this correctly
thought_partsparameter preserves reasoning separately from message content- No breaking changes to existing APIs
Summary of Changes
Hello @thesynapses, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request resolves critical issues related to content handling and API compliance in LiteLLM streaming responses, especially when tool calls are involved. It refines how planning text and tool responses are processed during streaming to ensure a clean, accurate, and API-compliant conversation history, while also improving the agent runner's ability to interpret completion states from various models.
Highlights
- Eliminate Content Duplication in Streaming Tool Call Responses: Previously, planning and reasoning text would appear twice in streaming responses when using LiteLLM models with ADK's planning features: once as individual chunks and again in the aggregated tool-call message. This fix ensures such text is streamed only once, providing a cleaner user experience.
- Align Tool-Call Messages with OpenAI/LiteLLM API Conventions: Tool-call-only messages now correctly set
content=None, adhering to API specifications. This prevents redundant content in conversation history and ensures proper semantic representation of tool-call turns. - Prevent Double-Serialization of Tool Responses: Addressed an issue where
_safe_json_serializewould double-serialize already-JSON string tool responses, leading to malformed (triple-nested) JSON and parsing failures for models like Claude/GPT. The change now checks if the response is already a string before serialization. - Ensure Proper
finish_reasonMapping for Streaming Responses: Implemented logic to correctly map thefinish_reasonto theFinishReasonenum for aggregatedLlmResponseobjects in streaming mode. This ensures the ADK agent runner accurately recognizes completion states, mirroring behavior from the non-streaming path.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.
Summary of Changes
Hello @thesynapses, I'm Gemini Code Assist[^1]! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the robustness and API compliance of LiteLLM integrations within ADK, particularly for streaming workflows involving tool calls. It resolves critical issues related to content duplication, incorrect serialization of tool responses, and inconsistent finish_reason handling, leading to cleaner conversation histories, more reliable tool execution, and accurate agent state management.
Highlights
- Eliminate Content Duplication in Streaming Tool Calls: Resolved an issue where planning and reasoning text was duplicated in streaming responses when LiteLLM models made tool calls, appearing both as individual chunks and within the aggregated tool-call message. This is fixed by setting
content=Nonefor tool-only messages, aligning with OpenAI/LiteLLM API specifications. - Correct Tool Response Serialization: Addressed a bug where
_safe_json_serializewas double-serializing JSON strings withinfunction_response.response, leading to malformed, triple-nested JSON that prevented models like Claude/GPT from correctly parsing tool results. The change now checks if the response is already a string before serialization. - Consistent Finish Reason Mapping for Streaming: Implemented logic to correctly map the
finish_reasonto theFinishReasonenum for aggregatedLlmResponseobjects in streaming scenarios (both tool-call and text-only responses). This ensures that the ADK agent runner can properly recognize completion states, mirroring behavior from non-streaming paths.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with :thumbsup: and :thumbsdown: on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
[^1]: Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.
Hi @thesynapses , Thank you for your work on this pull request. We appreciate the effort you've invested. Before we can proceed with the review can you please fix the lint errors. You can use autoformat.sh.