[Bug] Litellm Proxy hangs with Claude 3 w/ multiple system messages
When using litellm proxy with Claude-3-sonnet and microsoft autogen framework using openai specification, litellm hangs and never sends a request out, and has no feedback or error. I have to manually kill the process in terminal. However, it works fine when I send a curl request directly to the proxy server url. I assume something in the autogen request is causing this issue. Tried re-installing everything with no luck.
Tried on mac with latest litellm proxy, python 3.11 and 3.12, and pyautogen 0.2.20.
Here are the logs from the proxy:
14:53:19 - LiteLLM Proxy:DEBUG: proxy_server.py:1760 - Loaded config YAML (api_key and environment_variables are not shown): { "model_list": [ { "model_name": "claude-3-sonnet-20240229", "litellm_params": { "model": "claude-3-sonnet-20240229", "api_key": "removed" } } ] } LiteLLM: Proxy initialized with Config, Set models: claude-3-sonnet-20240229 14:53:19 - LiteLLM Router:DEBUG: router.py:2091 - Initialized Model List [{'model_name': 'claude-3-sonnet-20240229', 'litellm_params': {'model': 'claude-3-sonnet-20240229', 'api_key': 'removed'}, 'model_info': {'id': 'c029cce1-0976-4c42-a3a3-a8a5eb5e407a'}}] 14:53:19 - LiteLLM Router:DEBUG: router.py:286 - Intialized router with Routing strategy: simple-shuffle
14:53:19 - LiteLLM Proxy:DEBUG: utils.py:33 - INITIALIZING LITELLM CALLBACKS!
14:53:19 - LiteLLM:DEBUG: utils.py:831 - callback: <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x111ab5650>
14:53:19 - LiteLLM:DEBUG: utils.py:831 - callback: <bound method Router.deployment_callback_on_failure of <litellm.router.Router object at 0x111b7f790>>
14:53:19 - LiteLLM:DEBUG: utils.py:831 - callback: <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x111ab5690>
14:53:19 - LiteLLM:DEBUG: utils.py:831 - callback: <bound method Router.deployment_callback_on_failure of <litellm.router.Router object at 0x111b94a90>>
14:53:19 - LiteLLM:DEBUG: utils.py:831 - callback: <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x110470750>
14:53:19 - LiteLLM:DEBUG: utils.py:831 - callback: <bound method ProxyLogging.response_taking_too_long_callback of <litellm.proxy.utils.ProxyLogging object at 0x111ab5210>>
14:53:19 - LiteLLM Proxy:DEBUG: proxy_server.py:2707 - prisma client - None
14:53:19 - LiteLLM Proxy:DEBUG: proxy_server.py:2711 - custom_db_client client - None
14:53:19 - LiteLLM Proxy:DEBUG: proxy_server.py:2762 - custom_db_client client None. Master_key: None
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:4000 (Press CTRL+C to quit)
14:53:26 - LiteLLM Proxy:DEBUG: proxy_server.py:3048 - Request Headers: Headers({'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.14.2', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.14.2', 'x-stainless-os': 'MacOS', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.11.7', 'authorization': 'Bearer Notrequired', 'x-stainless-async': 'false', 'content-length': '1025'})
14:53:26 - LiteLLM Proxy:DEBUG: proxy_server.py:3054 - receiving data: {'messages': [{'content': "You are in a role play game. The following roles are available:\nUser_Proxy: A computer terminal that performs no other action than running Python scripts (provided to it quoted in python code blocks), or sh shell scripts (provided to it quoted in sh code blocks).\nData_Analysis_agent: You are in a group chat.\n As a Data Analyst, you use your analytics skills to analyze the customer data given to you and provide accurate,\n insightful information, and suggestions..\n\nRead the following conversation.\nThen select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role.", 'role': 'system'}, {'content': 'Fullfill this question from the rep: hi', 'role': 'user', 'name': 'User_Proxy'}, {'role': 'system', 'content': "Read the above conversation. Then select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role."}], 'model': 'claude-3-sonnet-20240229', 'stream': False, 'temperature': 0, 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.14.2', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.14.2', 'x-stainless-os': 'MacOS', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.11.7', 'authorization': 'Bearer Notrequired', 'x-stainless-async': 'false', 'content-length': '1025'}, 'body': {'messages': [{'content': "You are in a role play game. The following roles are available:\nUser_Proxy: A computer terminal that performs no other action than running Python scripts (provided to it quoted in python code blocks), or sh shell scripts (provided to it quoted in sh code blocks).\nData_Analysis_agent: You are in a group chat.\n As a Data Analyst, you use your analytics skills to analyze the customer data given to you and provide accurate,\n insightful information, and suggestions ..\n\nRead the following conversation.\nThen select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role.", 'role': 'system'}, {'content': 'Fullfill this question from the rep: hi', 'role': 'user', 'name': 'User_Proxy'}, {'role': 'system', 'content': "Read the above conversation. Then select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role."}], 'model': 'claude-3-sonnet-20240229', 'stream': False, 'temperature': 0}}}
14:53:26 - LiteLLM Proxy:DEBUG: utils.py:33 - Inside Proxy Logging Pre-call hook!
14:53:26 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:21 - Inside Max Parallel Request Pre-Call Hook
14:53:26 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: Notrequired::2024-03-22-14-53::request_count; local_only: False
14:53:26 - LiteLLM:DEBUG: caching.py:21 - in_memory_result: None
14:53:26 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: None
14:53:26 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:21 - current: None
14:53:26 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: None::2024-03-22-14-53::request_count; local_only: False
14:53:26 - LiteLLM:DEBUG: caching.py:21 - in_memory_result: None
14:53:26 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: None
14:53:26 - LiteLLM:DEBUG: caching.py:21 - set cache: key: None::2024-03-22-14-53::request_count; value: {'current_requests': 1, 'current_tpm': 0, 'current_rpm': 0}
14:53:26 - LiteLLM:DEBUG: caching.py:21 - InMemoryCache: set_cache
14:53:26 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: None_user_api_key_user_id; local_only: False
14:53:26 - LiteLLM:DEBUG: caching.py:21 - in_memory_result: None
14:53:26 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: None
14:53:26 - LiteLLM Proxy:DEBUG: utils.py:33 - final data being sent to completion call: {'messages': [{'content': "You are in a role play game. The following roles are available:\nUser_Proxy: A computer terminal that performs no other action than running Python scripts (provided to it quoted in python code blocks), or sh shell scripts (provided to it quoted in sh code blocks).\nData_Analysis_agent: You are in a group chat.\n As a Data Analyst, you use your analytics skills to analyze the customer data given to you and provide accurate,\n insightful information, and suggestions ..\n\nRead the following conversation.\nThen select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role.", 'role': 'system'}, {'content': 'Fullfill this question from the rep: hi', 'role': 'user', 'name': 'User_Proxy'}, {'role': 'system', 'content': "Read the above conversation. Then select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role."}], 'model': 'claude-3-sonnet-20240229', 'stream': False, 'temperature': 0, 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.14.2', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.14.2', 'x-stainless-os': 'MacOS', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.11.7', 'authorization': 'Bearer Notrequired', 'x-stainless-async': 'false', 'content-length': '1025'}, 'body': {'messages': [{'content': "You are in a role play game. The following roles are available:\nUser_Proxy: A computer terminal that performs no other action than running Python scripts (provided to it quoted in python code blocks), or sh shell scripts (provided to it quoted in sh code blocks).\nData_Analysis_agent: You are in a group chat.\n As a Data Analyst, you use your analytics skills to analyze the customer data given to you and provide accurate,\n insightful information, and suggestions ..\n\nRead the following conversation.\nThen select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role.", 'role': 'system'}, {'content': 'Fullfill this question from the rep: hi', 'role': 'user', 'name': 'User_Proxy'}, {'role': 'system', 'content': "Read the above conversation. Then select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role."}], 'model': 'claude-3-sonnet-20240229', 'stream': False, 'temperature': 0}}, 'metadata': {'user_api_key': 'Notrequired', 'user_api_key_alias': None, 'user_api_key_user_id': None, 'user_api_key_team_id': None, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.14.2', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.14.2', 'x-stainless-os': 'MacOS', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.11.7', 'x-stainless-async': 'false', 'content-length': '1025'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions'}, 'request_timeout': 600}
14:53:26 - LiteLLM Router:DEBUG: router.py:1220 - Inside async function with retries: args - (); kwargs - {'stream': False, 'temperature': 0, 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.14.2', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.14.2', 'x-stainless-os': 'MacOS', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.11.7', 'authorization': 'Bearer Notrequired', 'x-stainless-async': 'false', 'content-length': '1025'}, 'body': {'messages': [{'content': "You are in a role play game. The following roles are available:\nUser_Proxy: A computer terminal that performs no other action than running Python scripts (provided to it quoted in python code blocks), or sh shell scripts (provided to it quoted in sh code blocks).\nData_Analysis_agent: You are in a group chat.\n As a Data Analyst, you use your analytics skills to analyze the customer data given to you and provide accurate,\n insightful information, and suggestions ..\n\nRead the following conversation.\nThen select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role.", 'role': 'system'}, {'content': 'Fullfill this question from the rep: hi', 'role': 'user', 'name': 'User_Proxy'}, {'role': 'system', 'content': "Read the above conversation. Then select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role."}], 'model': 'claude-3-sonnet-20240229', 'stream': False, 'temperature': 0}}, 'metadata': {'user_api_key': 'Notrequired', 'user_api_key_alias': None, 'user_api_key_user_id': None, 'user_api_key_team_id': None, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.14.2', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.14.2', 'x-stainless-os': 'MacOS', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.11.7', 'x-stainless-async': 'false', 'content-length': '1025'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'model_group': 'claude-3-sonnet-20240229'}, 'request_timeout': 600, 'model': 'claude-3-sonnet-20240229', 'messages': [{'content': "You are in a role play game. The following roles are available:\nUser_Proxy: A computer terminal that performs no other action than running Python scripts (provided to it quoted in python code blocks), or sh shell scripts (provided to it quoted in sh code blocks).\nData_Analysis_agent: You are in a group chat.\n As a Data Analyst, you use your analytics skills to analyze the customer data given to you and provide accurate,\n insightful information, and suggestions ..\n\nRead the following conversation.\nThen select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role.", 'role': 'system'}, {'content': 'Fullfill this question from the rep: hi', 'role': 'user', 'name': 'User_Proxy'}, {'role': 'system', 'content': "Read the above conversation. Then select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role."}], 'original_function': <bound method Router._acompletion of <litellm.router.Router object at 0x111b7f790>>, 'num_retries': 0}
14:53:26 - LiteLLM Router:DEBUG: router.py:1228 - async function w/ retries: original_function - <bound method Router._acompletion of <litellm.router.Router object at 0x111b7f790>>
14:53:26 - LiteLLM Router:DEBUG: router.py:404 - Inside _acompletion()- model: claude-3-sonnet-20240229; kwargs: {'stream': False, 'temperature': 0, 'proxy_server_request': {'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.14.2', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.14.2', 'x-stainless-os': 'MacOS', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.11.7', 'authorization': 'Bearer Notrequired', 'x-stainless-async': 'false', 'content-length': '1025'}, 'body': {'messages': [{'content': "You are in a role play game. The following roles are available:\nUser_Proxy: A computer terminal that performs no other action than running Python scripts (provided to it quoted in python code blocks), or sh shell scripts (provided to it quoted in sh code blocks).\nData_Analysis_agent: You are in a group chat.\n As a Data Analyst, you use your analytics skills to analyze the customer data given to you and provide accurate,\n insightful information, and suggestions ..\n\nRead the following conversation.\nThen select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role.", 'role': 'system'}, {'content': 'Fullfill this question from the rep: hi', 'role': 'user', 'name': 'User_Proxy'}, {'role': 'system', 'content': "Read the above conversation. Then select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role."}], 'model': 'claude-3-sonnet-20240229', 'stream': False, 'temperature': 0}}, 'metadata': {'user_api_key': 'Notrequired', 'user_api_key_alias': None, 'user_api_key_user_id': None, 'user_api_key_team_id': None, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.14.2', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.14.2', 'x-stainless-os': 'MacOS', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.11.7', 'x-stainless-async': 'false', 'content-length': '1025'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'model_group': 'claude-3-sonnet-20240229'}, 'request_timeout': 600}
14:53:26 - LiteLLM Router:DEBUG: router.py:2194 - initial list of deployments: [{'model_name': 'claude-3-sonnet-20240229', 'litellm_params': {'model': 'claude-3-sonnet-20240229', 'api_key': 'removed'}, 'model_info': {'id': 'c029cce1-0976-4c42-a3a3-a8a5eb5e407a'}}]
14:53:26 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: 14-53:cooldown_models; local_only: False
14:53:26 - LiteLLM:DEBUG: caching.py:21 - in_memory_result: None
14:53:26 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: None
14:53:26 - LiteLLM Router:DEBUG: router.py:1622 - retrieve cooldown models: []
14:53:26 - LiteLLM Router:DEBUG: router.py:2202 - cooldown deployments: []
14:53:26 - LiteLLM Router:DEBUG: router.py:2212 - healthy deployments: length 1 [{'model_name': 'claude-3-sonnet-20240229', 'litellm_params': {'model': 'claude-3-sonnet-20240229', 'api_key': 'removed'}, 'model_info': {'id': 'c029cce1-0976-4c42-a3a3-a8a5eb5e407a'}}]
14:53:26 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: c029cce1-0976-4c42-a3a3-a8a5eb5e407a_async_client; local_only: True
14:53:26 - LiteLLM:DEBUG: caching.py:21 - in_memory_result: None
14:53:26 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: None
14:53:26 - LiteLLM:DEBUG: caching.py:21 - get cache: cache key: c029cce1-0976-4c42-a3a3-a8a5eb5e407a_async_client; local_only: True
14:53:26 - LiteLLM:DEBUG: caching.py:21 - in_memory_result: None
14:53:26 - LiteLLM:DEBUG: caching.py:21 - get cache: cache result: None
14:53:26 - LiteLLM:DEBUG: utils.py:831 -
14:53:26 - LiteLLM:DEBUG: utils.py:831 - Request to litellm:
14:53:26 - LiteLLM:DEBUG: utils.py:831 - litellm.acompletion(model='claude-3-sonnet-20240229', api_key='removed', messages=[{'content': "You are in a role play game. The following roles are available:\nUser_Proxy: A computer terminal that performs no other action than running Python scripts (provided to it quoted in python code blocks), or sh shell scripts (provided to it quoted in sh code blocks).\nData_Analysis_agent: You are in a group chat.\n As a Data Analyst, you use your analytics skills to analyze the customer data given to you and provide accurate,\n insightful information, and suggestions ..\n\nRead the following conversation.\nThen select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role.", 'role': 'system'}, {'content': 'Fullfill this question from the rep: hi', 'role': 'user', 'name': 'User_Proxy'}, {'role': 'system', 'content': "Read the above conversation. Then select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role."}], caching=False, client=None, timeout=6000, stream=False, temperature=0, proxy_server_request={'url': 'http://0.0.0.0:4000/chat/completions', 'method': 'POST', 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.14.2', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.14.2', 'x-stainless-os': 'MacOS', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.11.7', 'authorization': 'Bearer Notrequired', 'x-stainless-async': 'false', 'content-length': '1025'}, 'body': {'messages': [{'content': "You are in a role play game. The following roles are available:\nUser_Proxy: A computer terminal that performs no other action than running Python scripts (provided to it quoted in python code blocks), or sh shell scripts (provided to it quoted in sh code blocks).\nData_Analysis_agent: You are in a group chat.\n As a Data Analyst, you use your analytics skills to analyze the customer data given to you and provide accurate,\n insightful information, and suggestions ..\n\nRead the following conversation.\nThen select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role.", 'role': 'system'}, {'content': 'Fullfill this question from the rep: hi', 'role': 'user', 'name': 'User_Proxy'}, {'role': 'system', 'content': "Read the above conversation. Then select the next role from ['User_Proxy', 'Data_Analysis_agent'] to play. Only return the role."}], 'model': 'claude-3-sonnet-20240229', 'stream': False, 'temperature': 0}}, metadata={'user_api_key': 'Notrequired', 'user_api_key_alias': None, 'user_api_key_user_id': None, 'user_api_key_team_id': None, 'user_api_key_metadata': {}, 'headers': {'host': '0.0.0.0:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.14.2', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.14.2', 'x-stainless-os': 'MacOS', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.11.7', 'x-stainless-async': 'false', 'content-length': '1025'}, 'endpoint': 'http://0.0.0.0:4000/chat/completions', 'model_group': 'claude-3-sonnet-20240229', 'deployment': 'claude-3-sonnet-20240229', 'model_info': {'id': 'c029cce1-0976-4c42-a3a3-a8a5eb5e407a'}, 'caching_groups': None}, request_timeout=600, model_info={'id': 'c029cce1-0976-4c42-a3a3-a8a5eb5e407a'}, max_retries=0)
14:53:26 - LiteLLM:DEBUG: utils.py:831 -
14:53:26 - LiteLLM:DEBUG: utils.py:831 - Initialized litellm callbacks, Async Success Callbacks: [<litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x111ab5690>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x110470750>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x111ab5650>] 14:53:26 - LiteLLM:DEBUG: utils.py:831 - callback: <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x111ab5650> 14:53:26 - LiteLLM:DEBUG: utils.py:831 - callback: <bound method Router.deployment_callback_on_failure of <litellm.router.Router object at 0x111b7f790>> 14:53:26 - LiteLLM:DEBUG: utils.py:831 - callback: <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x111ab5690> 14:53:26 - LiteLLM:DEBUG: utils.py:831 - callback: <bound method Router.deployment_callback_on_failure of <litellm.router.Router object at 0x111b94a90>> 14:53:26 - LiteLLM:DEBUG: utils.py:831 - callback: <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x110470750> 14:53:26 - LiteLLM:DEBUG: utils.py:831 - callback: <bound method ProxyLogging.response_taking_too_long_callback of <litellm.proxy.utils.ProxyLogging object at 0x111ab5210>> 14:53:26 - LiteLLM:DEBUG: utils.py:831 - self.optional_params: {} 14:53:26 - LiteLLM:DEBUG: utils.py:831 - litellm.cache: None 14:53:26 - LiteLLM:DEBUG: utils.py:831 - kwargs[caching]: False; litellm.cache: None 14:53:26 - LiteLLM:DEBUG: utils.py:831 - kwargs[caching]: False; litellm.cache: None 14:53:26 - LiteLLM:DEBUG: utils.py:4375 - LiteLLM completion() model= claude-3-sonnet-20240229; provider = anthropic 14:53:26 - LiteLLM:DEBUG: utils.py:4378 - LiteLLM: Params passed to completion() {'functions': None, 'function_call': None, 'temperature': 0, 'top_p': None, 'n': None, 'stream': False, 'stop': None, 'max_tokens': None, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'model': 'claude-3-sonnet-20240229', 'custom_llm_provider': 'anthropic', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': 0, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None} 14:53:26 - LiteLLM:DEBUG: utils.py:4381 - LiteLLM: Non-Default params passed to completion() {'temperature': 0, 'stream': False, 'max_retries': 0} 14:53:26 - LiteLLM:DEBUG: utils.py:831 - Final returned optional params: {'temperature': 0} 14:53:26 - LiteLLM:DEBUG: utils.py:831 - self.optional_params: {'temperature': 0}
Looks like when the request contains multiple system messages, litellm just hangs with no feedback. autogen creates multiple system messages when using groupchat.
@venkat-arvo thanks for the investigation - i'll look into this today
able to repro the hang
fix pushed @venkat-arvo - https://github.com/BerriAI/litellm/commit/691a83b7dcd6461fb7f780a156076b195e8c8bb1
Should be live in the next release - v1.33.5