[Bug]: vertex ai gemini 1.5 flash LiteLLM Proxy: Response cut off mid-sentence (sometimes in the middle of a word)
What happened?
When using LiteLLM Proxy with streaming often (around 20% of the time) the response gets cut off. The model was going to use a tool in that response, but it was cut off before that.
I am using Vertex AI with Gemini 1.5 Flash. There is nothing in the logs, and no errors.
Relevant log output
No response
Twitter / LinkedIn details
No response
Here is an example. Here I repeated the same prompt to see whether it would be cut off the same way. The temperature is 0 for reproducibility, but the same happens with different values
There is nothing in the logs, and no errors.
Did the stream just end? Can you try sharing an example with --detailed_debug enabled @JamDon2
iirc their stream sometimes changes and returns partial json's - https://github.com/BerriAI/litellm/blob/0d0f46a826c42f52db56bfdc4e0dbf6913652671/litellm/tests/test_streaming.py#L865
Perhaps this is related to that?
I'm currently looking through the logs, and I see this error sometimes:
ValueError: User doesn't exist in db. 'user_id'=admin. Create user via /user/new call.
It appears randomly, not when making a request, and the UI is not open.
This looks like the relevant part. So what this means, is that the Vertex AI endpoint returned "I", and then stopped the completion?
INFO: 172.18.0.1:41794 - "POST /v1/chat/completions HTTP/1.1" 200 OK
10:26:04 - LiteLLM Proxy:DEBUG: proxy_server.py:2579 - async_data_generator: received streaming chunk - ModelResponse(id='chatcmpl-ID_REDACTED', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content='I', role='assistant', function_call=None, tool_calls=None), logprobs=None)], created=1727605564, model='gemini-1.5-flash', object='chat.completion.chunk', system_fingerprint=None)
10:26:04 - LiteLLM Proxy:DEBUG: proxy_server.py:2579 - async_data_generator: received streaming chunk - ModelResponse(id='chatcmpl-ID_REDACTED', choices=[StreamingChoices(finish_reason='stop', index=0, delta=Delta(content=None, role=None, function_call=None, tool_calls=None), logprobs=None)], created=1727605564, model='gemini-1.5-flash', object='chat.completion.chunk', system_fingerprint=None)
Hmm none of this explains why a stream would stop. Can you email me ([email protected]) the complete logs or we can debug over a call? https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.