litellm
litellm copied to clipboard
[Bug]: Broken s3 cache creation with streaming?
What happened?
Caching does not seem to working with this PoC:
#!/usr/bin/env python3.11
# -*- coding: utf-8 -*-
# Author: David Manouchehri
import os
import asyncio
import openai
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
c_handler = logging.StreamHandler()
logger.addHandler(c_handler)
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_API_BASE = os.getenv("OPENAI_API_BASE") or "https://api.openai.com/v1"
client = openai.AsyncOpenAI(
api_key=OPENAI_API_KEY,
base_url=OPENAI_API_BASE,
)
async def main():
response = await client.chat.completions.create(
model="gemini-1.5-pro-preview-0409",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What’s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
],
stream=True,
temperature=0.0,
)
logger.debug("Failed to print non-stream")
current_str = ""
async for chunk in response:
logger.debug(chunk)
if chunk.choices[0].delta.content:
current_str += chunk.choices[0].delta.content
logger.debug(current_str)
logger.debug("---")
if __name__ == "__main__":
asyncio.run(main())
Caching is working with this:
#!/usr/bin/env python3.11
# -*- coding: utf-8 -*-
# Author: David Manouchehri
import os
import asyncio
import openai
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
c_handler = logging.StreamHandler()
logger.addHandler(c_handler)
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_API_BASE = os.getenv("OPENAI_API_BASE") or "https://api.openai.com/v1"
client = openai.AsyncOpenAI(
api_key=OPENAI_API_KEY,
base_url=OPENAI_API_BASE,
)
async def main():
response = await client.chat.completions.create(
model="gemini-1.5-pro-preview-0409",
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What’s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
],
stream=False,
temperature=0.0,
)
logger.debug(response.model_dump_json(indent=2))
if __name__ == "__main__":
asyncio.run(main())
Note: if you run the non-streaming script, then the streaming script will successfully use the cache.
Relevant log output
No response
Twitter / LinkedIn details
https://www.linkedin.com/in/davidmanouchehri/
i don't see how you've setup caching. can you share that too?
litellm_settings:
drop_params: True
cache: True
cache_params:
type: s3
s3_bucket_name: os.environ/CACHING_S3_BUCKET_NAME
s3_region_name: os.environ/CACHING_AWS_DEFAULT_REGION
s3_aws_access_key_id: os.environ/CACHING_AWS_ACCESS_KEY_ID
s3_aws_secret_access_key: os.environ/CACHING_AWS_SECRET_ACCESS_KEY
s3_endpoint_url: os.environ/CACHING_AWS_ENDPOINT_URL_S3
failure_callback: ["sentry", "langfuse"]
num_retries_per_request: 3
success_callback: ["langfuse", "s3"]
s3_callback_params:
s3_bucket_name: os.environ/LOGGING_S3_BUCKET_NAME
s3_region_name: os.environ/LOGGING_AWS_DEFAULT_REGION
s3_aws_access_key_id: os.environ/LOGGING_AWS_ACCESS_KEY_ID
s3_aws_secret_access_key: os.environ/LOGGING_AWS_SECRET_ACCESS_KEY
s3_endpoint_url: os.environ/LOGGING_AWS_ENDPOINT_URL_S3
default_team_settings:
- team_id: david_dev
success_callback: ["langfuse", "s3"]
langfuse_secret: os.environ/LANGFUSE_PRIVATE_KEY_DAVID
langfuse_public_key: os.environ/LANGFUSE_PUBLIC_KEY_DAVID
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: os.environ/DATABASE_URL
database_connection_pool_limit: 1
disable_spend_logs: True
router_settings:
routing_strategy: simple-shuffle
environment_variables:
model_list:
- model_name: gemini-1.5-pro-preview-0409
litellm_params:
model: vertex_ai/gemini-1.5-pro-preview-0409
vertex_project: litellm-epic
vertex_location: northamerica-northeast1
max_tokens: 8192
- model_name: gemini-1.5-pro-preview-0409
litellm_params:
model: vertex_ai/gemini-1.5-pro-preview-0409
vertex_project: litellm-epic
vertex_location: southamerica-east1
max_tokens: 8192
I am using a key that belongs to david_dev
.
i believe we have some testing on this. will look into this more
@Manouchehri would help if you could add any bugs you believe we should prioritize to this week's bug bash - https://github.com/BerriAI/litellm/issues/3045
Heading to bed atm, will do tomorrow! Thank you! This one and the s3 team logging are the two highest priorities for me for sure.
Do you want me to maybe create github issue labels for low, medium, high, and critical priorities? That's what my team does for our internal projects. 😀
This is still a bug btw, checked today.
Just double-checked and it's indeed a point of failure. S3 cache for streaming is not working on the SDK. It's also not working for local and disk.
Ok, just double-checked our case. When using completion
at a FastAPI async method, it will behave like that. Just change to the async version — acompletion
and it will work. Probably related to this PR: https://github.com/BerriAI/litellm/pull/4756
this seems to work on both async+sync
Reopening as it doesn't seem to actually work when running LiteLLM as a proxy server.
@Manouchehri Unable to repro the issue, this works as expected for me -
Config
litellm_settings:
success_callback: ["langfuse", "prometheus"]
failure_callback: ["prometheus"]
cache: true
cache_params: # set cache params for s3
type: s3
s3_bucket_name: cache-bucket-litellm # AWS Bucket Name for S3
s3_region_name: us-west-2 # AWS Region Name for S3
s3_aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID # us os.environ/<variable name> to pass environment variables. This is AWS Access Key ID for S3
s3_aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY # AWS Secret Access Key for S3
Testing
1st call (no cache hit)
2nd call (cache hit)
CURL
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
"messages": [
{
"content": "Hey, how'\''s it going 1234?",
"role": "user"
}
],
"model": "gpt-3.5-turbo",
"temperature": 0.7,
"stream": true
}'
thanks for sharing your script, i'll try that too
Your script works too:
See 2nd run just emitting 2 chunks
(base) krrishdholakia@Krrishs-MacBook-Air temp_py_folder % python3 test_openai_caching_streaming.py
Failed to print non-stream
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='Hello ', function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='this ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='is ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='a ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='test ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='response ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='from ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response from
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='a ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response from a
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='fixed ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response from a fixed
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='OpenAI ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response from a fixed OpenAI
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='endpoint. ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response from a fixed OpenAI endpoint.
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response from a fixed OpenAI endpoint.
---
(base) krrishdholakia@Krrishs-MacBook-Air temp_py_folder % python3 test_openai_caching_streaming.py
Failed to print non-stream
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='Hello this is a test response from a fixed OpenAI endpoint. ', function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449128, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response from a fixed OpenAI endpoint.
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1727449128, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response from a fixed OpenAI endpoint.
---
Failed to print non-stream
I'm not sure what this debug statement is for, we return cache hits in streaming form, if the original request is a stream, so your client code isn't impacted.
Works on model=vertex_ai/gemini-1.5-pro-preview-0409
(base) krrishdholakia@Krrishs-MacBook-Air temp_py_folder % python3 test_openai_caching_streaming.py
Failed to print non-stream
ChatCompletionChunk(id='chatcmpl-f19743a9-8655-4934-9a5c-8276b0ad58c0', choices=[Choice(delta=ChoiceDelta(content='The', function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449328, model='gemini-1.5-pro-preview-0409', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The
---
ChatCompletionChunk(id='chatcmpl-f19743a9-8655-4934-9a5c-8276b0ad58c0', choices=[Choice(delta=ChoiceDelta(content=' image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449328, model='gemini-1.5-pro-preview-0409', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk
---
ChatCompletionChunk(id='chatcmpl-f19743a9-8655-4934-9a5c-8276b0ad58c0', choices=[Choice(delta=ChoiceDelta(content=' stretches from the foreground into the distance, inviting a stroll amidst the tall grasses.', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449328, model='gemini-1.5-pro-preview-0409', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk stretches from the foreground into the distance, inviting a stroll amidst the tall grasses.
---
ChatCompletionChunk(id='chatcmpl-f19743a9-8655-4934-9a5c-8276b0ad58c0', choices=[Choice(delta=ChoiceDelta(content=' The lush greenery of the marsh extends on either side of the boardwalk, creating a sense of tranquility and natural beauty. The sky above is a stunning blue, dotted', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449328, model='gemini-1.5-pro-preview-0409', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk stretches from the foreground into the distance, inviting a stroll amidst the tall grasses. The lush greenery of the marsh extends on either side of the boardwalk, creating a sense of tranquility and natural beauty. The sky above is a stunning blue, dotted
---
ChatCompletionChunk(id='chatcmpl-f19743a9-8655-4934-9a5c-8276b0ad58c0', choices=[Choice(delta=ChoiceDelta(content=' with fluffy white clouds, adding to the serene atmosphere of the scene. \n', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449328, model='gemini-1.5-pro-preview-0409', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk stretches from the foreground into the distance, inviting a stroll amidst the tall grasses. The lush greenery of the marsh extends on either side of the boardwalk, creating a sense of tranquility and natural beauty. The sky above is a stunning blue, dotted with fluffy white clouds, adding to the serene atmosphere of the scene.
---
ChatCompletionChunk(id='chatcmpl-f19743a9-8655-4934-9a5c-8276b0ad58c0', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1727449328, model='gemini-1.5-pro-preview-0409', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk stretches from the foreground into the distance, inviting a stroll amidst the tall grasses. The lush greenery of the marsh extends on either side of the boardwalk, creating a sense of tranquility and natural beauty. The sky above is a stunning blue, dotted with fluffy white clouds, adding to the serene atmosphere of the scene.
---
(base) krrishdholakia@Krrishs-MacBook-Air temp_py_folder % python3 test_openai_caching_streaming.py
Failed to print non-stream
ChatCompletionChunk(id='chatcmpl-f19743a9-8655-4934-9a5c-8276b0ad58c0', choices=[Choice(delta=ChoiceDelta(content='The image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk stretches from the foreground into the distance, inviting a stroll amidst the tall grasses. The lush greenery of the marsh extends on either side of the boardwalk, creating a sense of tranquility and natural beauty. The sky above is a stunning blue, dotted with fluffy white clouds, adding to the serene atmosphere of the scene. \n', function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449330, model='gemini-1.5-pro-preview-0409', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk stretches from the foreground into the distance, inviting a stroll amidst the tall grasses. The lush greenery of the marsh extends on either side of the boardwalk, creating a sense of tranquility and natural beauty. The sky above is a stunning blue, dotted with fluffy white clouds, adding to the serene atmosphere of the scene.
---
ChatCompletionChunk(id='chatcmpl-f19743a9-8655-4934-9a5c-8276b0ad58c0', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1727449330, model='gemini-1.5-pro-preview-0409', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk stretches from the foreground into the distance, inviting a stroll amidst the tall grasses. The lush greenery of the marsh extends on either side of the boardwalk, creating a sense of tranquility and natural beauty. The sky above is a stunning blue, dotted with fluffy white clouds, adding to the serene atmosphere of the scene.
---
(base) krrishdholakia@Krrishs-MacBook-Air temp_py_folder %