litellm icon indicating copy to clipboard operation
litellm copied to clipboard

fix(vertex_ai): Resolve JSONDecodeError in Gemini streaming

Open AlanPonnachan opened this issue 1 month ago • 2 comments

Resolve JSONDecodeError in Gemini streaming

Relevant issues

Fixes #16562

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • [x] I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • [x] I have added a screenshot of my new test passing locally
  • [x] My PR passes all unit tests on make test-unit
  • [x] My PR's scope is as isolated as possible, it only solves 1 specific problem
image

Type

🐛 Bug Fix ✅ Test

Changes

Problem

The streaming parser for Vertex AI Gemini models (ModelResponseIterator) would crash with a JSONDecodeError if a partial (fragmented) JSON chunk was received from the stream after the first complete chunk had already been processed. This caused intermittent but critical failures in production environments.

Root Cause

The error handling logic in the handle_valid_json_chunk method contained a guard condition (if self.sent_first_chunk is False:). This condition only allowed the JSON accumulation/buffering logic to be triggered for the very first chunk in the stream. If any subsequent chunk was fragmented, the condition would be false, and the JSONDecodeError would be re-raised instead of handled.

Solution

The fix removes the self.sent_first_chunk is False guard. Now, any JSONDecodeError will correctly trigger the switch to the JSON accumulation mode (handle_accumulated_json_chunk). This makes the stream parser robust and allows it to correctly buffer and assemble fragmented JSON objects at any point during the stream, not just at the beginning.

Testing

  • Added two new unit tests to tests/test_litellm/llms/vertex_ai/gemini/test_vertex_and_google_ai_studio_gemini.py:
    • test_streaming_iterator_handles_partial_json_after_first_chunk_sync: Verifies the fix for synchronous streams.
    • test_streaming_iterator_handles_partial_json_after_first_chunk_async: Verifies the fix for asynchronous streams.
  • These tests simulate a stream where a complete JSON chunk is followed by a fragmented one. They failed with a RuntimeError: Error parsing chunk... before the fix and now pass, confirming the bug is resolved.

AlanPonnachan avatar Nov 13 '25 13:11 AlanPonnachan

@AlanPonnachan is attempting to deploy a commit to the CLERKIEAI Team on Vercel.

A member of the Team first needs to authorize it.

vercel[bot] avatar Nov 13 '25 13:11 vercel[bot]

@krrishdholakia please review

AlanPonnachan avatar Nov 21 '25 19:11 AlanPonnachan