litellm
litellm copied to clipboard
Add in Bedrock Mistral Streaming fix for litellm proxy
With V1.30.3, the internal litellm response operations supported streaming, however doing any OpenAI API calls for streaming returned empty responses with the litellm proxy. Upon further inspection, it shows that the response being sent by Bedrock for the mistral models is found in chunk_data['outputs'][0]['text']
. In the utils, for handling the bedrock stream, this PR adds in a condition for Bedrock Mistral formatted streaming.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
litellm | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Mar 11, 2024 3:34pm |
litellm-dashboard | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Mar 11, 2024 3:34pm |
@sean-bailey could you add a test for this here - https://github.com/BerriAI/litellm/blob/713f5991b8528a311b878886a2c455e68d639077/litellm/tests/test_bedrock_completion.py#L4
Bonus if you can attach a screenshot of it working for you.
Side note: dm'ed on Linkedin to learn how you're using the proxy!
Would love to chat if you have ~10 mins this/next week? https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
@sean-bailey could you add a test for this here -
https://github.com/BerriAI/litellm/blob/713f5991b8528a311b878886a2c455e68d639077/litellm/tests/test_bedrock_completion.py#L4
Bonus if you can attach a screenshot of it working for you.
Side note: dm'ed on Linkedin to learn how you're using the proxy!
Would love to chat if you have ~10 mins this/next week? https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
I don't see other tests for streaming in that file, but I can provide the openAI compatible code I used to get streaming to work with the proxy.
from openai import OpenAI
endpointUrl="http://localhost:8000/v1"
promptmessage = "What is the capital of France?"
yourAPIKey="gsdfgsdfg"
agentprompt="You are a helpful assistant."
prompt_message=promptmessage
client=OpenAI(api_key=yourAPIKey,base_url=endpointUrl)
stream = client.chat.completions.create(
model="mixtral-8x7b-instruct-v0:1",
messages= [
{"content": agentprompt, "role": "system"},
{"content": prompt_message, "role": "user"},
],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content,end="")
You should see something similar to a streamed output of
The capital of France is Paris. Paris is a major European city and a global center for art, fashion, gastronomy, and culture. It is located along the Seine River, in the north of France. The city is divided into 20 arrondissements, or districts, and is well known for its beautiful architecture, museums, and landmarks such as the Eiffel Tower, the Louvre Museum, the Notre-Dame Cathedral, and the Palace of Versailles. Paris is also home to many prestigious universities and research institutions, making it a hub for education and innovation.
Streaming would look better on video, but this code should be repeatable to test with, unless you'd like that inside of the test file.
The config file I used for running this locally was pretty straightforward:
model_list:
- model_name: mixtral-8x7b-instruct-v0:1
litellm_params:
model: "bedrock/mistral.mixtral-8x7b-instruct-v0:1"
aws_region_name: "us-west-2"
litellm_settings: # module level litellm settings - https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py
drop_params: True
set_verbose: True
doing a pip install litellm[proxy] to set things up, and running it with litellm --config config.yaml
Related: #2464
Will review this and either merge the pr or push a fix for the issue this week @sean-bailey @GlavitsBalazs
Hey @sean-bailey @GlavitsBalazs this should be fixed in v1.34.13
Let me know if this error persists for y'all