litellm [Bug]: vertex ai gemini 1.5 flash LiteLLM Proxy: Response cut off mid-sentence (sometimes in the middle of a word)

What happened?

When using LiteLLM Proxy with streaming often (around 20% of the time) the response gets cut off. The model was going to use a tool in that response, but it was cut off before that.

I am using Vertex AI with Gemini 1.5 Flash. There is nothing in the logs, and no errors.

Relevant log output

No response

Twitter / LinkedIn details

No response

Sep 28 '24 22:09 JamDon2

Here is an example. Here I repeated the same prompt to see whether it would be cut off the same way. The temperature is 0 for reproducibility, but the same happens with different values

Sep 28 '24 22:09 JamDon2

There is nothing in the logs, and no errors.

Did the stream just end? Can you try sharing an example with --detailed_debug enabled @JamDon2

iirc their stream sometimes changes and returns partial json's - https://github.com/BerriAI/litellm/blob/0d0f46a826c42f52db56bfdc4e0dbf6913652671/litellm/tests/test_streaming.py#L865

Perhaps this is related to that?

Sep 29 '24 01:09 krrishdholakia

I'm currently looking through the logs, and I see this error sometimes: ValueError: User doesn't exist in db. 'user_id'=admin. Create user via /user/new call.

It appears randomly, not when making a request, and the UI is not open.

This looks like the relevant part. So what this means, is that the Vertex AI endpoint returned "I", and then stopped the completion?

INFO:     172.18.0.1:41794 - "POST /v1/chat/completions HTTP/1.1" 200 OK
10:26:04 - LiteLLM Proxy:DEBUG: proxy_server.py:2579 - async_data_generator: received streaming chunk - ModelResponse(id='chatcmpl-ID_REDACTED', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content='I', role='assistant', function_call=None, tool_calls=None), logprobs=None)], created=1727605564, model='gemini-1.5-flash', object='chat.completion.chunk', system_fingerprint=None)
10:26:04 - LiteLLM Proxy:DEBUG: proxy_server.py:2579 - async_data_generator: received streaming chunk - ModelResponse(id='chatcmpl-ID_REDACTED', choices=[StreamingChoices(finish_reason='stop', index=0, delta=Delta(content=None, role=None, function_call=None, tool_calls=None), logprobs=None)], created=1727605564, model='gemini-1.5-flash', object='chat.completion.chunk', system_fingerprint=None)

Sep 29 '24 10:09 JamDon2

Hmm none of this explains why a stream would stop. Can you email me ([email protected]) the complete logs or we can debug over a call? https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

Sep 29 '24 13:09 krrishdholakia

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Jan 28 '25 02:01 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Apr 29 '25 00:04 github-actions[bot]