litellm [Bug]: Broken s3 cache creation with streaming?

What happened?

Caching does not seem to working with this PoC:

#!/usr/bin/env python3.11
# -*- coding: utf-8 -*-
# Author: David Manouchehri

import os
import asyncio
import openai
import logging

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

c_handler = logging.StreamHandler()
logger.addHandler(c_handler)

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_API_BASE = os.getenv("OPENAI_API_BASE") or "https://api.openai.com/v1"

client = openai.AsyncOpenAI(
    api_key=OPENAI_API_KEY,
    base_url=OPENAI_API_BASE,
)

async def main():
    response = await client.chat.completions.create(
        model="gemini-1.5-pro-preview-0409",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "What’s in this image?"
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                        }
                    }
                ]
            }
        ],
        stream=True,
        temperature=0.0,
    )
    
    logger.debug("Failed to print non-stream")

    current_str = ""
    async for chunk in response:
        logger.debug(chunk)
        if chunk.choices[0].delta.content:
            current_str += chunk.choices[0].delta.content
        
        logger.debug(current_str)
        logger.debug("---")


if __name__ == "__main__":
    asyncio.run(main())

Caching is working with this:

#!/usr/bin/env python3.11
# -*- coding: utf-8 -*-
# Author: David Manouchehri

import os
import asyncio
import openai
import logging

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)

c_handler = logging.StreamHandler()
logger.addHandler(c_handler)

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_API_BASE = os.getenv("OPENAI_API_BASE") or "https://api.openai.com/v1"

client = openai.AsyncOpenAI(
    api_key=OPENAI_API_KEY,
    base_url=OPENAI_API_BASE,
)

async def main():
    response = await client.chat.completions.create(
        model="gemini-1.5-pro-preview-0409",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "What’s in this image?"
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                        }
                    }
                ]
            }
        ],
        stream=False,
        temperature=0.0,
    )

    logger.debug(response.model_dump_json(indent=2))

if __name__ == "__main__":
    asyncio.run(main())

Note: if you run the non-streaming script, then the streaming script will successfully use the cache.

Relevant log output

No response

Twitter / LinkedIn details

https://www.linkedin.com/in/davidmanouchehri/

Apr 24 '24 15:04 Manouchehri

i don't see how you've setup caching. can you share that too?

Apr 24 '24 15:04 krrishdholakia

litellm_settings:
  drop_params: True
  cache: True
  cache_params:
    type: s3
    s3_bucket_name: os.environ/CACHING_S3_BUCKET_NAME
    s3_region_name: os.environ/CACHING_AWS_DEFAULT_REGION
    s3_aws_access_key_id: os.environ/CACHING_AWS_ACCESS_KEY_ID
    s3_aws_secret_access_key: os.environ/CACHING_AWS_SECRET_ACCESS_KEY
    s3_endpoint_url: os.environ/CACHING_AWS_ENDPOINT_URL_S3
  failure_callback: ["sentry", "langfuse"]
  num_retries_per_request: 3
  success_callback: ["langfuse", "s3"]
  s3_callback_params:
    s3_bucket_name: os.environ/LOGGING_S3_BUCKET_NAME
    s3_region_name: os.environ/LOGGING_AWS_DEFAULT_REGION
    s3_aws_access_key_id: os.environ/LOGGING_AWS_ACCESS_KEY_ID
    s3_aws_secret_access_key: os.environ/LOGGING_AWS_SECRET_ACCESS_KEY
    s3_endpoint_url: os.environ/LOGGING_AWS_ENDPOINT_URL_S3
  default_team_settings:
    - team_id: david_dev
      success_callback: ["langfuse", "s3"]
      langfuse_secret: os.environ/LANGFUSE_PRIVATE_KEY_DAVID
      langfuse_public_key: os.environ/LANGFUSE_PUBLIC_KEY_DAVID

general_settings: 
  master_key: os.environ/LITELLM_MASTER_KEY
  database_url: os.environ/DATABASE_URL
  database_connection_pool_limit: 1
  disable_spend_logs: True

router_settings:
  routing_strategy: simple-shuffle

environment_variables:

model_list:
  - model_name: gemini-1.5-pro-preview-0409
    litellm_params:
      model: vertex_ai/gemini-1.5-pro-preview-0409
      vertex_project: litellm-epic
      vertex_location: northamerica-northeast1
      max_tokens: 8192


  - model_name: gemini-1.5-pro-preview-0409
    litellm_params:
      model: vertex_ai/gemini-1.5-pro-preview-0409
      vertex_project: litellm-epic
      vertex_location: southamerica-east1
      max_tokens: 8192

I am using a key that belongs to david_dev.

Apr 24 '24 15:04 Manouchehri

i believe we have some testing on this. will look into this more

Apr 25 '24 06:04 krrishdholakia

@Manouchehri would help if you could add any bugs you believe we should prioritize to this week's bug bash - https://github.com/BerriAI/litellm/issues/3045

Apr 25 '24 06:04 krrishdholakia

Heading to bed atm, will do tomorrow! Thank you! This one and the s3 team logging are the two highest priorities for me for sure.

Do you want me to maybe create github issue labels for low, medium, high, and critical priorities? That's what my team does for our internal projects. 😀

Apr 25 '24 06:04 Manouchehri

This is still a bug btw, checked today.

May 03 '24 05:05 Manouchehri

Just double-checked and it's indeed a point of failure. S3 cache for streaming is not working on the SDK. It's also not working for local and disk.

Aug 18 '24 21:08 danielbichuetti

Ok, just double-checked our case. When using completion at a FastAPI async method, it will behave like that. Just change to the async version — acompletion and it will work. Probably related to this PR: https://github.com/BerriAI/litellm/pull/4756

Aug 18 '24 22:08 danielbichuetti

this seems to work on both async+sync

Aug 19 '24 17:08 krrishdholakia

Reopening as it doesn't seem to actually work when running LiteLLM as a proxy server.

Sep 27 '24 10:09 Manouchehri

@Manouchehri Unable to repro the issue, this works as expected for me -

Config

litellm_settings:
  success_callback: ["langfuse", "prometheus"]
  failure_callback: ["prometheus"]
  cache: true
  cache_params:        # set cache params for s3
    type: s3
    s3_bucket_name: cache-bucket-litellm   # AWS Bucket Name for S3
    s3_region_name: us-west-2              # AWS Region Name for S3
    s3_aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID  # us os.environ/<variable name> to pass environment variables. This is AWS Access Key ID for S3
    s3_aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY  # AWS Secret Access Key for S3

Testing

1st call (no cache hit)

2nd call (cache hit)

CURL

curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-1234' \
-d '{
    "messages": [
        {
            "content": "Hey, how'\''s it going 1234?",
            "role": "user"
        }
    ],
    "model": "gpt-3.5-turbo",
    "temperature": 0.7,
    "stream": true
}'

Sep 27 '24 14:09 krrishdholakia

thanks for sharing your script, i'll try that too

Sep 27 '24 14:09 krrishdholakia

Your script works too:

See 2nd run just emitting 2 chunks

(base) krrishdholakia@Krrishs-MacBook-Air temp_py_folder % python3 test_openai_caching_streaming.py 
Failed to print non-stream
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='Hello ', function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello 
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='this ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this 
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='is ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is 
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='a ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a 
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='test ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test 
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='response ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response 
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='from ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response from 
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='a ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response from a 
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='fixed ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response from a fixed 
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='OpenAI ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response from a fixed OpenAI 
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='endpoint. ', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response from a fixed OpenAI endpoint. 
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1727449123, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response from a fixed OpenAI endpoint. 
---
(base) krrishdholakia@Krrishs-MacBook-Air temp_py_folder % python3 test_openai_caching_streaming.py
Failed to print non-stream
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content='Hello this is a test response from a fixed OpenAI endpoint. ', function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449128, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response from a fixed OpenAI endpoint. 
---
ChatCompletionChunk(id='chatcmpl-6994f9ad9ede45f99933891ae79c1c23', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1727449128, model='gpt-3.5-turbo', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
Hello this is a test response from a fixed OpenAI endpoint. 
---

Sep 27 '24 14:09 krrishdholakia

Failed to print non-stream

I'm not sure what this debug statement is for, we return cache hits in streaming form, if the original request is a stream, so your client code isn't impacted.

Sep 27 '24 14:09 krrishdholakia

Works on model=vertex_ai/gemini-1.5-pro-preview-0409

(base) krrishdholakia@Krrishs-MacBook-Air temp_py_folder % python3 test_openai_caching_streaming.py
Failed to print non-stream
ChatCompletionChunk(id='chatcmpl-f19743a9-8655-4934-9a5c-8276b0ad58c0', choices=[Choice(delta=ChoiceDelta(content='The', function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449328, model='gemini-1.5-pro-preview-0409', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The
---
ChatCompletionChunk(id='chatcmpl-f19743a9-8655-4934-9a5c-8276b0ad58c0', choices=[Choice(delta=ChoiceDelta(content=' image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449328, model='gemini-1.5-pro-preview-0409', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk
---
ChatCompletionChunk(id='chatcmpl-f19743a9-8655-4934-9a5c-8276b0ad58c0', choices=[Choice(delta=ChoiceDelta(content=' stretches from the foreground into the distance, inviting a stroll amidst the tall grasses.', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449328, model='gemini-1.5-pro-preview-0409', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk stretches from the foreground into the distance, inviting a stroll amidst the tall grasses.
---
ChatCompletionChunk(id='chatcmpl-f19743a9-8655-4934-9a5c-8276b0ad58c0', choices=[Choice(delta=ChoiceDelta(content=' The lush greenery of the marsh extends on either side of the boardwalk, creating a sense of tranquility and natural beauty. The sky above is a stunning blue, dotted', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449328, model='gemini-1.5-pro-preview-0409', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk stretches from the foreground into the distance, inviting a stroll amidst the tall grasses. The lush greenery of the marsh extends on either side of the boardwalk, creating a sense of tranquility and natural beauty. The sky above is a stunning blue, dotted
---
ChatCompletionChunk(id='chatcmpl-f19743a9-8655-4934-9a5c-8276b0ad58c0', choices=[Choice(delta=ChoiceDelta(content=' with fluffy white clouds, adding to the serene atmosphere of the scene. \n', function_call=None, refusal=None, role=None, tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449328, model='gemini-1.5-pro-preview-0409', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk stretches from the foreground into the distance, inviting a stroll amidst the tall grasses. The lush greenery of the marsh extends on either side of the boardwalk, creating a sense of tranquility and natural beauty. The sky above is a stunning blue, dotted with fluffy white clouds, adding to the serene atmosphere of the scene. 

---
ChatCompletionChunk(id='chatcmpl-f19743a9-8655-4934-9a5c-8276b0ad58c0', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1727449328, model='gemini-1.5-pro-preview-0409', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk stretches from the foreground into the distance, inviting a stroll amidst the tall grasses. The lush greenery of the marsh extends on either side of the boardwalk, creating a sense of tranquility and natural beauty. The sky above is a stunning blue, dotted with fluffy white clouds, adding to the serene atmosphere of the scene. 

---
(base) krrishdholakia@Krrishs-MacBook-Air temp_py_folder % python3 test_openai_caching_streaming.py
Failed to print non-stream
ChatCompletionChunk(id='chatcmpl-f19743a9-8655-4934-9a5c-8276b0ad58c0', choices=[Choice(delta=ChoiceDelta(content='The image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk stretches from the foreground into the distance, inviting a stroll amidst the tall grasses. The lush greenery of the marsh extends on either side of the boardwalk, creating a sense of tranquility and natural beauty. The sky above is a stunning blue, dotted with fluffy white clouds, adding to the serene atmosphere of the scene. \n', function_call=None, refusal=None, role='assistant', tool_calls=None), finish_reason=None, index=0, logprobs=None)], created=1727449330, model='gemini-1.5-pro-preview-0409', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk stretches from the foreground into the distance, inviting a stroll amidst the tall grasses. The lush greenery of the marsh extends on either side of the boardwalk, creating a sense of tranquility and natural beauty. The sky above is a stunning blue, dotted with fluffy white clouds, adding to the serene atmosphere of the scene. 

---
ChatCompletionChunk(id='chatcmpl-f19743a9-8655-4934-9a5c-8276b0ad58c0', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role=None, tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1727449330, model='gemini-1.5-pro-preview-0409', object='chat.completion.chunk', service_tier=None, system_fingerprint=None, usage=None)
The image shows a wooden boardwalk winding its way through a vibrant green marsh. The boardwalk stretches from the foreground into the distance, inviting a stroll amidst the tall grasses. The lush greenery of the marsh extends on either side of the boardwalk, creating a sense of tranquility and natural beauty. The sky above is a stunning blue, dotted with fluffy white clouds, adding to the serene atmosphere of the scene. 

---
(base) krrishdholakia@Krrishs-MacBook-Air temp_py_folder %

Sep 27 '24 15:09 krrishdholakia

litellm litellm copied to clipboard

[Bug]: Broken s3 cache creation with streaming?

What happened?

Relevant log output

Twitter / LinkedIn details

Config

Testing

1st call (no cache hit)

2nd call (cache hit)

CURL

litellm
litellm copied to clipboard