autogen
autogen copied to clipboard
Catch token count issue while streaming with customized models
If llama, llava, phi, or some other models are used for streaming (with stream=True), the current design would crash after fetching the response.
A warning is enough in this case, just like the non-streaming use cases.
Why are these changes needed?
Related issue number
Checks
- [ ] I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
- [ ] I've added tests (if relevant) corresponding to the changes introduced in this PR.
- [ ] I've made sure all auto checks have passed.
Codecov Report
Attention: Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.
Project coverage is 21.29%. Comparing base (
6aaa238) to head (7d1a110). Report is 4 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| autogen/oai/client.py | 0.00% | 5 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #3241 +/- ##
===========================================
- Coverage 33.24% 21.29% -11.95%
===========================================
Files 99 99
Lines 11016 11020 +4
Branches 2365 2537 +172
===========================================
- Hits 3662 2347 -1315
- Misses 7026 8507 +1481
+ Partials 328 166 -162
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 21.26% <0.00%> (-11.99%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hey @BeibinLi, can you share a llm config that is crashing? Is it using the standard openai client class?
@marklysze Yes, let's say I am using Ollama with "stream=True". Here is a simple code to reproduce the error:
from autogen import AssistantAgent, UserProxyAgent
config_list = [
{
"model": "llama3.1:70b",
"api_key": "ollama",
"base_url": "http://127.0.0.1:13579/v1"
}
]
llm_config = {"config_list": config_list, "stream": True}
assistant = AssistantAgent(name="assistant", llm_config=llm_config)
user_proxy = UserProxyAgent(name="user", human_input_mode="NEVER", max_consecutive_auto_reply=1)
chat_res = user_proxy.initiate_chat(assistant, message="How are you")
@marklysze Yes, let's say I am using Ollama with "stream=True". Here is a simple code to reproduce the error:
Thanks @BeibinLi, I changed the config a bit (api_key to api_type, and I can't run 70b so running 8b) to use the Ollama client from PR #3056:
from autogen import AssistantAgent, UserProxyAgent
config_list = [
{
"model": "llama3.1:8b-instruct-q8_0",
"api_type": "ollama",
"client_host": "http://192.168.0.115:11434",
}
]
llm_config = {"config_list": config_list, "stream": True}
assistant = AssistantAgent(name="assistant", llm_config=llm_config)
user_proxy = UserProxyAgent(name="user", human_input_mode="NEVER", max_consecutive_auto_reply=1)
chat_res = user_proxy.initiate_chat(assistant, message="How are you")
And it runs through okay for me.
For your original config, is that trying to use Ollama with the default client?
@marklysze Yes, I was using the original client, and your "api_type" hack works. Would it also work for LM Studio or other local hosts?
@marklysze Yes, I was using the original client, and your "api_type" hack works. Would it also work for LM Studio or other local hosts?
@BeibinLi, I don't think the Ollama REST API is fully compatible with the OpenAI API one. The Ollama PR #3056 uses the Ollama python library instead.
So, I'm not surprised the AutoGen default client will fail when trying to use Ollama's REST API... do you think we should try to cater for this and catch the error? I'm thinking we can steer people to use the Ollama client class (e.g. pip install pyautogen[ollama]) when it's ready.
We don't have 01/Yi/LM Studio/TogetherAI and many other customized clients provided, and they all go through the classic oai client by default. Unless we want to reroute these traffics all to the Ollama client, the developers have to handle the stream issue themselves. Alternatively, it is also ok to leave the exception to the developers for them to create their own clients.
@sonichi @qingyun-wu What do you think about this design issue.
We don't have 01/Yi/LM Studio/TogetherAI and many other customized clients provided, and they all go through the classic oai client by default. Unless we want to reroute these traffics all to the Ollama client, the developers have to handle the stream issue themselves. Alternatively, it is also ok to leave the exception to the developers for them to create their own clients.
@sonichi @qingyun-wu What do you think about this design issue.
Yes, I wouldn't recommend using the Ollama client for anything other than Ollama (because it will have its own idiosyncrasies). Just a note we do have a Together.AI client class but, you are right, anything we don't have will go through the default OAI one.