litellm [Bug]: Exception in logging code: 'async_generator' object has no attribute 'choices'

What happened?

Thanks for making this project! The following exception appears when I make an API call to my litellm-proxy setup:

Task exception was never retrieved
future: <Task finished name='Task-6' coro=<Logging.async_success_handler() done, defined at /usr/local/lib/python3.9/site-packages/litellm/utils.py:1347> exception=AttributeError("'async_generator' object has no attribute 'choices'")>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/litellm/utils.py", line 1357, in async_success_handler
    if result.choices[0].finish_reason is not None:  # if it's the last chunk
AttributeError: 'async_generator' object has no attribute 'choices'

It appears to be related to the logging code and the API call still works fine so I assume this has no impact on functionality.

Here is my config.yml:

.openai_api_key: &openai_api_key "sk-XXX"
.mistral_api_key: &mistral_api_key "XXX"
.ollama_params: &ollama_params
  api_base: "http://ollama.svc.intra.vepta.org:11434"

model_list:
  - model_name: gpt-4 # user-facing model alias
    litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
      model: gpt-4-1106-preview
      api_key: *openai_api_key
  - model_name: gpt-3.5
    litellm_params:
      model: gpt-3.5-turbo-1106
      api_key: *openai_api_key
  - model_name: mistral-tiny
    litellm_params:
      model: mistral/mistral-tiny
      api_key: *mistral_api_key
  - model_name: mistral-medium
    litellm_params:
      model: mistral/mistral-medium
      api_key: *mistral_api_key
  - model_name: mistral-small
    litellm_params:
      model: mistral/mistral-small
      api_key: *mistral_api_key
  - model_name: local-mistral-tiny-q8
    litellm_params:
      <<: *ollama_params
      model: ollama/mistral:7b-instruct-v0.2-q8_0
  - model_name: local-mistral-small-dolphin-q4
    litellm_params:
      <<: *ollama_params
      model: ollama/dolphin-mixtral:latest
  - model_name: local-mistral-small-q3
    litellm_params:
      <<: *ollama_params
      model: ollama/mixtral-q3_k_s-gpu:latest

litellm_settings: # module level litellm settings - https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py
  drop_params: True
  cache: True
  cache_params:
    type: s3
    s3_bucket_name: cache.litellm
    s3_region_name: us-east-1
    s3_aws_access_key_id: "XXX"
    s3_aws_secret_access_key: "XXX"
    s3_endpoint_url: "https://s3-v.svc.intra.vepta.org/"
  set_verbose: True

general_settings: 
  master_key: sk-hunter2

And here is my docker-compose.yml, nothing crazy here:

version: "3.9"
services:
  litellm:
    image: ghcr.io/berriai/litellm:main-v1.17.5
    ports:
      - "127.0.0.1:8787:8787"
    volumes:
      - ./config.yml:/app/config.yml # Mount the local configuration file
    # You can change the port or number of workers as per your requirements or pass any new supported CLI augument. Make sure the port passed here matches with the container port defined above in `ports` value
    command: [ "--config", "/app/config.yml", "--port", "8787", "--num_workers", "8" ]

Relevant log output

[nulldev@crystal litellm]$ docker compose up --force-recreate
[+] Running 1/0
 ✔ Container litellm-litellm-1  Recreated                                                                                                                                                                                              0.0s 
Attaching to litellm-1
litellm-1  | [2024-01-15 05:21:51 +0000] [1] [INFO] Starting gunicorn 21.2.0
litellm-1  | [2024-01-15 05:21:51 +0000] [1] [INFO] Listening at: http://0.0.0.0:8787 (1)
litellm-1  | [2024-01-15 05:21:51 +0000] [1] [INFO] Using worker: uvicorn.workers.UvicornWorker
litellm-1  | [2024-01-15 05:21:51 +0000] [7] [INFO] Booting worker with pid: 7
litellm-1  | [2024-01-15 05:21:51 +0000] [7] [INFO] Started server process [7]
litellm-1  | [2024-01-15 05:21:51 +0000] [7] [INFO] Waiting for application startup.
litellm-1  | [2024-01-15 05:21:51 +0000] [8] [INFO] Booting worker with pid: 8
litellm-1  | [2024-01-15 05:21:51 +0000] [8] [INFO] Started server process [8]
litellm-1  | [2024-01-15 05:21:51 +0000] [8] [INFO] Waiting for application startup.
litellm-1  | [2024-01-15 05:21:51 +0000] [9] [INFO] Booting worker with pid: 9
litellm-1  | [2024-01-15 05:21:51 +0000] [9] [INFO] Started server process [9]
litellm-1  | [2024-01-15 05:21:51 +0000] [9] [INFO] Waiting for application startup.
litellm-1  | [2024-01-15 05:21:51 +0000] [10] [INFO] Booting worker with pid: 10
litellm-1  | [2024-01-15 05:21:51 +0000] [11] [INFO] Booting worker with pid: 11
litellm-1  | [2024-01-15 05:21:51 +0000] [10] [INFO] Started server process [10]
litellm-1  | [2024-01-15 05:21:51 +0000] [10] [INFO] Waiting for application startup.
litellm-1  | [2024-01-15 05:21:51 +0000] [11] [INFO] Started server process [11]
litellm-1  | [2024-01-15 05:21:51 +0000] [11] [INFO] Waiting for application startup.
litellm-1  | [2024-01-15 05:21:52 +0000] [12] [INFO] Booting worker with pid: 12
litellm-1  | [2024-01-15 05:21:52 +0000] [12] [INFO] Started server process [12]
litellm-1  | [2024-01-15 05:21:52 +0000] [12] [INFO] Waiting for application startup.
litellm-1  | [2024-01-15 05:21:52 +0000] [13] [INFO] Booting worker with pid: 13
litellm-1  | [2024-01-15 05:21:52 +0000] [13] [INFO] Started server process [13]
litellm-1  | [2024-01-15 05:21:52 +0000] [13] [INFO] Waiting for application startup.
litellm-1  | [2024-01-15 05:21:52 +0000] [14] [INFO] Booting worker with pid: 14
litellm-1  | [2024-01-15 05:21:52 +0000] [14] [INFO] Started server process [14]
litellm-1  | [2024-01-15 05:21:52 +0000] [14] [INFO] Waiting for application startup.
litellm-1  | [2024-01-15 05:21:52 +0000] [7] [INFO] Application startup complete.
litellm-1  | [2024-01-15 05:21:52 +0000] [8] [INFO] Application startup complete.
litellm-1  | [2024-01-15 05:21:52 +0000] [9] [INFO] Application startup complete.
litellm-1  | [2024-01-15 05:21:52 +0000] [10] [INFO] Application startup complete.
litellm-1  | [2024-01-15 05:21:52 +0000] [11] [INFO] Application startup complete.
litellm-1  | [2024-01-15 05:21:52 +0000] [12] [INFO] Application startup complete.
litellm-1  | [2024-01-15 05:21:52 +0000] [13] [INFO] Application startup complete.
litellm-1  | [2024-01-15 05:21:52 +0000] [14] [INFO] Application startup complete.
litellm-1  | 
litellm-1  | LiteLLM: Test your local proxy with: "litellm --test" This runs an openai.ChatCompletion request to your proxy [In a new terminal tab]
litellm-1  | 
litellm-1  | LiteLLM: Curl Command Test for your local proxy
litellm-1  |  curl --location 'http://0.0.0.0:8787/chat/completions' \
litellm-1  |                     --header 'Content-Type: application/json' \
litellm-1  |                     --data ' {
litellm-1  |                     "model": "gpt-3.5-turbo",
litellm-1  |                     "messages": [
litellm-1  |                         {
litellm-1  |                         "role": "user",
litellm-1  |                         "content": "what llm are you"
litellm-1  |                         }
litellm-1  |                     ]
litellm-1  |                     }'
litellm-1  |                     
litellm-1  | 
litellm-1  |                      
litellm-1  | 
litellm-1  | Docs: https://docs.litellm.ai/docs/simple_proxy
litellm-1  | 
litellm-1  | See all Router/Swagger docs on http://0.0.0.0:8787 
litellm-1  | 
litellm-1  | 
litellm-1  | #------------------------------------------------------------#
litellm-1  | #                                                            #
litellm-1  | #               'A feature I really want is...'               #
litellm-1  | #        https://github.com/BerriAI/litellm/issues/new        #
litellm-1  | #                                                            #
litellm-1  | #------------------------------------------------------------#
litellm-1  | 
litellm-1  |  Thank you for using LiteLLM! - Krrish & Ishaan
litellm-1  | 
litellm-1  | 
litellm-1  | 
litellm-1  | Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
litellm-1  | 
litellm-1  | 
litellm-1  | 
litellm-1  | Setting Cache on Proxy
litellm-1  | Set Cache on LiteLLM Proxy: {'bucket_name': 'cache.litellm', 's3_client': <botocore.client.S3 object at 0x7fd278775880>}
litellm-1  | LiteLLM: Proxy initialized with Config, Set models:
litellm-1  |     gpt-4
litellm-1  |     gpt-3.5
litellm-1  |     mistral-tiny
litellm-1  |     mistral-medium
litellm-1  |     mistral-small
litellm-1  |     local-mistral-tiny-q8
litellm-1  |     local-mistral-small-dolphin-q4
litellm-1  |     local-mistral-small-q3
litellm-1  | set cache: key: 39fe5801-0bbc-4e70-bbf5-fcb15ff402db_async_client; value: <openai.AsyncOpenAI object at 0x7fd278678e50>
litellm-1  | set cache: key: 39fe5801-0bbc-4e70-bbf5-fcb15ff402db_client; value: <openai.OpenAI object at 0x7fd27861c1c0>
litellm-1  | set cache: key: 39fe5801-0bbc-4e70-bbf5-fcb15ff402db_stream_async_client; value: <openai.AsyncOpenAI object at 0x7fd27862f700>
litellm-1  | set cache: key: 39fe5801-0bbc-4e70-bbf5-fcb15ff402db_stream_client; value: <openai.OpenAI object at 0x7fd278639c70>
litellm-1  | set cache: key: 10c05ce2-7728-46f5-8905-aad569e3a8c3_async_client; value: <openai.AsyncOpenAI object at 0x7fd27864b1f0>
litellm-1  | set cache: key: 10c05ce2-7728-46f5-8905-aad569e3a8c3_client; value: <openai.OpenAI object at 0x7fd2785d3760>
litellm-1  | set cache: key: 10c05ce2-7728-46f5-8905-aad569e3a8c3_stream_async_client; value: <openai.AsyncOpenAI object at 0x7fd2785deca0>
litellm-1  | set cache: key: 10c05ce2-7728-46f5-8905-aad569e3a8c3_stream_client; value: <openai.OpenAI object at 0x7fd2785ed250>
litellm-1  | set cache: key: 7076c534-bcf2-4edd-b698-70da58c1a096_async_client; value: <openai.AsyncOpenAI object at 0x7fd2785f9910>
litellm-1  | set cache: key: 7076c534-bcf2-4edd-b698-70da58c1a096_client; value: <openai.OpenAI object at 0x7fd278602d00>
litellm-1  | set cache: key: 7076c534-bcf2-4edd-b698-70da58c1a096_stream_async_client; value: <openai.AsyncOpenAI object at 0x7fd278592280>
litellm-1  | set cache: key: 7076c534-bcf2-4edd-b698-70da58c1a096_stream_client; value: <openai.OpenAI object at 0x7fd27859c7f0>
litellm-1  | set cache: key: 82767e40-8989-4bd1-8ca8-7344572f5018_async_client; value: <openai.AsyncOpenAI object at 0x7fd2785a7eb0>
litellm-1  | set cache: key: 82767e40-8989-4bd1-8ca8-7344572f5018_client; value: <openai.OpenAI object at 0x7fd2785b72e0>
litellm-1  | set cache: key: 82767e40-8989-4bd1-8ca8-7344572f5018_stream_async_client; value: <openai.AsyncOpenAI object at 0x7fd2785c1820>
litellm-1  | set cache: key: 82767e40-8989-4bd1-8ca8-7344572f5018_stream_client; value: <openai.OpenAI object at 0x7fd2785ccd90>
litellm-1  | set cache: key: c7dd6658-f71c-4b4b-b17a-0de12db64426_async_client; value: <openai.AsyncOpenAI object at 0x7fd27855c490>
litellm-1  | set cache: key: c7dd6658-f71c-4b4b-b17a-0de12db64426_client; value: <openai.OpenAI object at 0x7fd278566880>
litellm-1  | set cache: key: c7dd6658-f71c-4b4b-b17a-0de12db64426_stream_async_client; value: <openai.AsyncOpenAI object at 0x7fd278570dc0>
litellm-1  | set cache: key: c7dd6658-f71c-4b4b-b17a-0de12db64426_stream_client; value: <openai.OpenAI object at 0x7fd278582370>
litellm-1  | LiteLLM Proxy: INITIALIZING LITELLM CALLBACKS!
litellm-1  | callback: <bound method Router.deployment_callback_on_failure of <litellm.router.Router object at 0x7fd278678c40>>
litellm-1  | callback: <litellm.proxy.hooks.parallel_request_limiter.MaxParallelRequestsHandler object at 0x7fd27afd2040>
litellm-1  | callback: cache
litellm-1  | callback: <litellm.proxy.hooks.max_budget_limiter.MaxBudgetLimiter object at 0x7fd27afd2070>
litellm-1  | Inside Max Parallel Request Pre-Call Hook
litellm-1  | Inside Max Budget Limiter Pre-Call Hook
litellm-1  | get cache: cache key: None_user_api_key_user_id; local_only: False
litellm-1  | in_memory_result: None
litellm-1  | get cache: cache result: None
litellm-1  | LiteLLM Proxy: final data being sent to completion call: {'model': 'local-mistral-small-q3', 'stream': True, 'messages': [{'role': 'user', 'content': 'Describe to me USC section 7704.'}], 'proxy_server_request': {'url': 'http://ai-gateway.svc.intra.vepta.org/v1/chat/completions', 'method': 'POST', 'headers': {'host': 'ai-gateway.svc.intra.vepta.org', 'x-forwarded-proto': 'https', 'connection': 'close', 'content-length': '122', 'user-agent': 'python-requests/2.28.1', 'accept-encoding': 'gzip, deflate', 'accept': '*/*', 'authorization': 'Bearer sk-hunter2', 'content-type': 'application/json'}, 'body': {'model': 'local-mistral-small-q3', 'stream': True, 'messages': [{'role': 'user', 'content': 'Describe to me USC section 7704.'}]}}, 'metadata': {'user_api_key': 'sk-hunter2', 'headers': {'host': 'ai-gateway.svc.intra.vepta.org', 'x-forwarded-proto': 'https', 'connection': 'close', 'content-length': '122', 'user-agent': 'python-requests/2.28.1', 'accept-encoding': 'gzip, deflate', 'accept': '*/*', 'authorization': 'Bearer sk-hunter2', 'content-type': 'application/json'}, 'user_api_key_user_id': None}, 'request_timeout': 600}
litellm-1  | get cache: cache key: 05-22:cooldown_models; local_only: False
litellm-1  | in_memory_result: None
litellm-1  | get cache: cache result: None
litellm-1  | get cache: cache key: a6117501-5a08-42ae-877b-2a990592c090_stream_async_client; local_only: True
litellm-1  | in_memory_result: None
litellm-1  | get cache: cache result: None
litellm-1  | get cache: cache key: a6117501-5a08-42ae-877b-2a990592c090_stream_async_client; local_only: True
litellm-1  | in_memory_result: None
litellm-1  | get cache: cache result: None
litellm-1  | 
litellm-1  | 
litellm-1  | Request to litellm:
litellm-1  | litellm.acompletion(api_base='http://ollama.svc.intra.vepta.org:11434', model='ollama/mixtral-q3_k_s-gpu:latest', messages=[{'role': 'user', 'content': 'Describe to me USC section 7704.'}], caching=True, client=None, timeout=None, stream=True, proxy_server_request={'url': 'http://ai-gateway.svc.intra.vepta.org/v1/chat/completions', 'method': 'POST', 'headers': {'host': 'ai-gateway.svc.intra.vepta.org', 'x-forwarded-proto': 'https', 'connection': 'close', 'content-length': '122', 'user-agent': 'python-requests/2.28.1', 'accept-encoding': 'gzip, deflate', 'accept': '*/*', 'authorization': 'Bearer sk-hunter2', 'content-type': 'application/json'}, 'body': {'model': 'local-mistral-small-q3', 'stream': True, 'messages': [{'role': 'user', 'content': 'Describe to me USC section 7704.'}]}}, metadata={'user_api_key': 'sk-hunter2', 'headers': {'host': 'ai-gateway.svc.intra.vepta.org', 'x-forwarded-proto': 'https', 'connection': 'close', 'content-length': '122', 'user-agent': 'python-requests/2.28.1', 'accept-encoding': 'gzip, deflate', 'accept': '*/*', 'authorization': 'Bearer sk-hunter2', 'content-type': 'application/json'}, 'user_api_key_user_id': None, 'model_group': 'local-mistral-small-q3', 'deployment': 'ollama/mixtral-q3_k_s-gpu:latest', 'caching_groups': None}, request_timeout=600, model_info={'id': 'a6117501-5a08-42ae-877b-2a990592c090'}, max_retries=0)
litellm-1  | 
litellm-1  | 
litellm-1  | Initialized litellm callbacks, Async Success Callbacks: ['cache', <litellm.proxy.hooks.parallel_request_limiter.MaxParallelRequestsHandler object at 0x7fd27afd2040>, <litellm.proxy.hooks.max_budget_limiter.MaxBudgetLimiter object at 0x7fd27afd2070>]
litellm-1  | Task exception was never retrieved
litellm-1  | future: <Task finished name='Task-6' coro=<Logging.async_success_handler() done, defined at /usr/local/lib/python3.9/site-packages/litellm/utils.py:1347> exception=AttributeError("'async_generator' object has no attribute 'choices'")>
litellm-1  | Traceback (most recent call last):
litellm-1  |   File "/usr/local/lib/python3.9/site-packages/litellm/utils.py", line 1357, in async_success_handler
litellm-1  |     if result.choices[0].finish_reason is not None:  # if it's the last chunk
litellm-1  | AttributeError: 'async_generator' object has no attribute 'choices'
litellm-1  | callback: <bound method Router.deployment_callback_on_failure of <litellm.router.Router object at 0x7fd278678c40>>
litellm-1  | callback: <litellm.proxy.hooks.parallel_request_limiter.MaxParallelRequestsHandler object at 0x7fd27afd2040>
litellm-1  | callback: cache
litellm-1  | callback: <litellm.proxy.hooks.max_budget_limiter.MaxBudgetLimiter object at 0x7fd27afd2070>
litellm-1  | litellm.cache: <litellm.caching.Cache object at 0x7fd27a2d43d0>
litellm-1  | kwargs[caching]: True; litellm.cache: <litellm.caching.Cache object at 0x7fd27a2d43d0>
litellm-1  | INSIDE CHECKING CACHE
litellm-1  | Checking Cache
litellm-1  | 
litellm-1  | Getting Cache key. Kwargs: {'api_base': 'http://ollama.svc.intra.vepta.org:11434', 'model': 'ollama/mixtral-q3_k_s-gpu:latest', 'messages': [{'role': 'user', 'content': 'Describe to me USC section 7704.'}], 'caching': True, 'client': None, 'timeout': None, 'stream': True, 'proxy_server_request': {'url': 'http://ai-gateway.svc.intra.vepta.org/v1/chat/completions', 'method': 'POST', 'headers': {'host': 'ai-gateway.svc.intra.vepta.org', 'x-forwarded-proto': 'https', 'connection': 'close', 'content-length': '122', 'user-agent': 'python-requests/2.28.1', 'accept-encoding': 'gzip, deflate', 'accept': '*/*', 'authorization': 'Bearer sk-hunter2', 'content-type': 'application/json'}, 'body': {'model': 'local-mistral-small-q3', 'stream': True, 'messages': [{'role': 'user', 'content': 'Describe to me USC section 7704.'}]}}, 'metadata': {'user_api_key': 'sk-hunter2', 'headers': {'host': 'ai-gateway.svc.intra.vepta.org', 'x-forwarded-proto': 'https', 'connection': 'close', 'content-length': '122', 'user-agent': 'python-requests/2.28.1', 'accept-encoding': 'gzip, deflate', 'accept': '*/*', 'authorization': 'Bearer sk-hunter2', 'content-type': 'application/json'}, 'user_api_key_user_id': None, 'model_group': 'local-mistral-small-q3', 'deployment': 'ollama/mixtral-q3_k_s-gpu:latest', 'caching_groups': None}, 'request_timeout': 600, 'model_info': {'id': 'a6117501-5a08-42ae-877b-2a990592c090'}, 'max_retries': 0, 'litellm_call_id': 'a2a02815-e3f1-402e-8fcf-0c3d1e1a551c', 'litellm_logging_obj': <litellm.utils.Logging object at 0x7fd277d13d00>}
litellm-1  | 
litellm-1  | Created cache key: model: local-mistral-small-q3messages: [{'role': 'user', 'content': 'Describe to me USC section 7704.'}]
litellm-1  | Hashed cache key (SHA-256): 54c549ac2d57d197f36a42f7dcad4dfa0e9c1e9be2b91bd496f57a3ae32f9667
litellm-1  | Get S3 Cache: key: 54c549ac2d57d197f36a42f7dcad4dfa0e9c1e9be2b91bd496f57a3ae32f9667
litellm-1  | Got S3 Cache: key: 54c549ac2d57d197f36a42f7dcad4dfa0e9c1e9be2b91bd496f57a3ae32f9667, cached_response {'timestamp': 1705294423.504799, 'response': '{"id":"chatcmpl-d5be66e3-2831-4f34-a9ad-49065ea5dbc8","choices":[{"finish_reason":"stop","index":0,"message":{"content":"Section 7704 of the United States Code (USC) is part of the Internal Revenue Code and pertains to the taxation of certain annuity contracts issued by life insurance companies. The section outlines rules for determining whether an annuity contract is considered a \\"modified endowment contract\\" (MEC), which can have significant tax implications for the policyholder.\\n\\nA MEC is an annuity contract that receives more than the allowed amount of premiums within a specified period, typically within the first seven years after the contract is issued. When an annuity contract becomes a MEC, it triggers certain tax consequences:\\n\\n1. Withdrawals before age 59½ are subject to a 10% penalty tax, similar to retirement accounts like the IRA or 401(k).\\n2. If the annuity is surrendered (cashed in), the policyholder must recognize ordinary income tax on any gains within the contract. This differs from non-MEC annuities, which allow for tax-free growth of investments and only tax the portion of withdrawals that represents investment earnings.\\n\\nIn summary, USC Section 7704 deals with MECs and their specific taxation rules. It is crucial for policyholders and financial professionals to understand these rules when buying or selling annuity contracts to avoid unexpected tax consequences.","role":"assistant"}}],"created":1705294415,"model":"mixtral-q3_k_s-gpu:latest","object":"chat.completion","system_fingerprint":null,"usage":{"completion_tokens":256,"prompt_tokens":9,"total_tokens":265}}'}. Type Response <class 'dict'>
litellm-1  | Cache Hit!
litellm-1  | Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.utils.Logging object at 0x7fd277d13d00>>
litellm-1  | self.optional_params: {}
litellm-1  | Logging Details LiteLLM-Success Call
litellm-1  | success callbacks: ['cache', <litellm.proxy.hooks.parallel_request_limiter.MaxParallelRequestsHandler object at 0x7fd27afd2040>, <litellm.proxy.hooks.max_budget_limiter.MaxBudgetLimiter object at 0x7fd27afd2070>]
litellm-1  | success_callback: reaches cache for logging!
litellm-1  | success_callback: reaches cache for logging, there is no complete_streaming_response. Kwargs={'model': 'mixtral-q3_k_s-gpu:latest', 'messages': [{'role': 'user', 'content': 'Describe to me USC section 7704.'}], 'optional_params': {}, 'litellm_params': {'logger_fn': None, 'acompletion': True, 'metadata': {'user_api_key': 'sk-hunter2', 'headers': {'host': 'ai-gateway.svc.intra.vepta.org', 'x-forwarded-proto': 'https', 'connection': 'close', 'content-length': '122', 'user-agent': 'python-requests/2.28.1', 'accept-encoding': 'gzip, deflate', 'accept': '*/*', 'authorization': 'Bearer sk-hunter2', 'content-type': 'application/json'}, 'user_api_key_user_id': None, 'model_group': 'local-mistral-small-q3', 'deployment': 'ollama/mixtral-q3_k_s-gpu:latest', 'caching_groups': None}, 'model_info': {'id': 'a6117501-5a08-42ae-877b-2a990592c090'}, 'proxy_server_request': {'url': 'http://ai-gateway.svc.intra.vepta.org/v1/chat/completions', 'method': 'POST', 'headers': {'host': 'ai-gateway.svc.intra.vepta.org', 'x-forwarded-proto': 'https', 'connection': 'close', 'content-length': '122', 'user-agent': 'python-requests/2.28.1', 'accept-encoding': 'gzip, deflate', 'accept': '*/*', 'authorization': 'Bearer sk-hunter2', 'content-type': 'application/json'}, 'body': {'model': 'local-mistral-small-q3', 'stream': True, 'messages': [{'role': 'user', 'content': 'Describe to me USC section 7704.'}]}}, 'preset_cache_key': None, 'stream_response': {}}, 'start_time': datetime.datetime(2024, 1, 15, 5, 22, 18, 864249), 'stream': True, 'user': None, 'call_type': 'acompletion', 'input': [{'role': 'user', 'content': 'Describe to me USC section 7704.'}], 'api_key': None, 'original_response': '<async_generator object convert_to_streaming_response_async at 0x7fd27858dee0>', 'additional_args': None, 'log_event_type': 'successful_api_call', 'end_time': datetime.datetime(2024, 1, 15, 5, 22, 19, 388687), 'cache_hit': True}
litellm-1  | 
litellm-1  | 
litellm-1  | Async success callbacks: ['cache', <litellm.proxy.hooks.parallel_request_limiter.MaxParallelRequestsHandler object at 0x7fd27afd2040>, <litellm.proxy.hooks.max_budget_limiter.MaxBudgetLimiter object at 0x7fd27afd2070>]
litellm-1  | 172.24.0.1:41866 - "POST /v1/chat/completions HTTP/1.0" 200

Twitter / LinkedIn details

No response

Jan 15 '24 05:01 null-dev

Hey @null-dev can you give me a curl to repro this?

Seeing this late

Feb 20 '24 16:02 krrishdholakia

bump @null-dev can we see the curl command to repro the error ?

Apr 06 '24 15:04 ishaan-jaff

closing due to inactivity. @null-dev can we hop on a call sometime this week / next ? I'd love to learn how we can improve litellm for you

Sharing my cal for your convenience: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

Jun 12 '24 19:06 ishaan-jaff

Hey sorry about the lack of reply, I was unable to reproduce this as this was a request coming from https://github.com/open-webui/open-webui. Don't have time for a call now, but I'll let you know if I ever have any suggestions!

Jun 16 '24 23:06 null-dev