autogen
autogen copied to clipboard
Add support for Ollama, Palm, Claude-2, Cohere, Replicate Llama2, CodeLlama, Hugging Face (100+LLMs) - using LiteLLM
Why are these changes needed?
This PR adds support for the above mentioned LLMs using LiteLLM https://github.com/BerriAI/litellm/ LiteLLM is a lightweight package to simplify LLM API calls - use any llm as a drop in replacement for gpt-3.5-turbo.
Example
from litellm import completion
## set ENV variables
os.environ["OPENAI_API_KEY"] = "openai key"
os.environ["COHERE_API_KEY"] = "cohere key"
messages = [{ "content": "Hello, how are you?","role": "user"}]
# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)
# cohere call
response = completion(model="command-nightly", messages)
# anthropic call
response = completion(model="claude-instant-1", messages=messages)
Related issue number
Checks
- [ ] I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
- [ ] I've added tests (if relevant) corresponding to the changes introduced in this PR.
- [ ] I've made sure all auto checks have passed.
Addressing: https://github.com/microsoft/autogen/issues/44 https://github.com/microsoft/autogen/issues/45 https://github.com/microsoft/autogen/issues/34 https://github.com/microsoft/autogen/issues/46
We also support tracking max_tokens, cost, caching I noticed this repo has some utils for this https://docs.litellm.ai/docs/token_usage
@sonichi @thinkall can I get a review on this PR ?
Thank you. I'm looking for reviewers. Are you on discord?
~~Model names don’t always relate to litellm config names. Not sure if using the model name is the perfect variable fit here~~
A substitution with "litellm.completion" may not be adequate in this scenario. We may need additional checks. The OpenAI call is still preferred in most cases.
Another suggestion is: we should integrate LiteLLM natively with OAI_CONFIG_LIST, so that users don't need to worry about the backend.
@sonichi I'm on discord, my username is: 'ishaanberri.ai' If you can't find that, the LiteLLM discord is here: https://discord.com/invite/wuPM9dRgDw /
I can DM you once you join
Model names don’t always relate to litellm config names. Not sure if using the model name is the perfect variable fit here
@derekbar90 what do you mean by this ?
Another suggestion is: we should integrate LiteLLM natively with OAI_CONFIG_LIST, so that users don't need to worry about the backend.
@BeibinLi that's a great suggestion I can tackle that in this PR. I can read the OAI_CONFIG_LIST and pass that to the litellm.completion call
I just made it work with litellm, and it's great !. But I had to change quite a lot of code from the project...
I just made it work with litellm, and it's great !. But I had to change quite a lot of code from the project...
@inconnu26 You beat me too it, was just in the middle of making some code changes - can you share the changes you made to this PR? I think @ishaan-jaff will need to give permission to work off his branch.
You are adding/ importing "litellm" in "completion.py" but did not adjust the requirement files (like adding "pip install litellm")?
Hey @inconnu26 is there a best way to communicate? If there's any improvements we can make on our end to reduce code changes - I'd love to help with that.
Alternatively if you have a way I can see the code - that'd be helpful!
- can you share the changes you made to this PR? I think @ishaan-jaff will need to give permission to work off his branch.
done @AaronWard
Can confirm this works with palm/chat-bison now too !
Codecov Report
Merging #95 (f433e33) into main (294e006) will decrease coverage by
2.50%
. The diff coverage is6.66%
.
@@ Coverage Diff @@
## main #95 +/- ##
==========================================
- Coverage 43.31% 40.82% -2.50%
==========================================
Files 17 17
Lines 2133 2141 +8
Branches 481 482 +1
==========================================
- Hits 924 874 -50
- Misses 1126 1180 +54
- Partials 83 87 +4
Flag | Coverage Δ | |
---|---|---|
unittests | 40.82% <6.66%> (-2.41%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Files | Coverage Δ | |
---|---|---|
autogen/oai/completion.py | 18.43% <6.66%> (-0.09%) |
:arrow_down: |
working with huggingface/glaive-coder
I'm going through a few things and i spotted a few issues:
Issue 1: config_list_from_dotenv
-
config_list_from_dotenv
acts differently since the updates tocompletion.py
, it works when i have andOPENAI_API_KEY
in my.env
file, and still uses the hugging face model (with the bad outputs explained in issue 2) - When i don't have an
OPENAI_API_KEY
in my.env
file the, and i specify a hugging face model thecoding_assistant
goes into a continuos loop of stating the user input. (See output below.) - I've made sure to clear the cache after each run to ensure i'm not picking up any previous conversations mistakenly.
- As you'll see it will work with
config_list_from_json
so there appears to be something isolated toconfig_list_from_dotenv
which is causing the issue, as it depends onconfig_list_from_json
.
Code👨💻
import os
import shutil
from pathlib import Path
def clear_cache():
# Function for cleaning up cash to
# avoid potential spill of conversation
# between models
# Should be run before and after each chat initialization
folder_path = '.cache'
if os.path.exists(folder_path) and os.path.isdir(folder_path):
shutil.rmtree(folder_path)
import autogen
from autogen import AssistantAgent, UserProxyAgent
clear_cache()
# config_list_from_dotenv no OPENAI_API_KEY
config_list = autogen.config_list_from_dotenv(
dotenv_file_path='../.env',
model_api_key_map={
# "gpt-4": "OPENAI_API_KEY",
"huggingface/mistralai/Mistral-7B-v0.1": "HUGGINGFACE_HUB",
},
filter_dict={
"model": {
# "gpt-4",
"huggingface/mistralai/Mistral-7B-v0.1",
}
}
)
print(config_list)
coding_assistant = AssistantAgent(
name="coding_assistant",
llm_config={
"request_timeout": 1000,
"seed": 42,
"config_list": config_list,
"temperature": 0.4,
},
)
coding_runner = UserProxyAgent(
name="coding_runner",
human_input_mode="NEVER",
max_consecutive_auto_reply=3,
is_termination_msg = lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
code_execution_config={
"work_dir": "coding",
"use_docker": False
},
)
coding_runner.initiate_chat(coding_assistant, message="Calculate the percentage gain YTD for Berkshire Hathaway stock, save to png.")
clear_cache()
No Openai Key in .env file - opensourced model
[{'api_key': 'hf_YFaO*******', 'model': 'huggingface/mistralai/Mistral-7B-v0.1'}]
coding_runner (to coding_assistant):
Calculate the percentage gain YTD for Berkshire Hathaway stock, save to png.
--------------------------------------------------------------------------------
coding_assistant (to coding_runner):
Calculate the percentage gain YTD for Berkshire Hathaway stock, save to
--------------------------------------------------------------------------------
coding_runner (to coding_assistant):
--------------------------------------------------------------------------------
coding_assistant (to coding_runner):
Calculate the percentage gain YTD for Berkshire Hathaway stock, save to
--------------------------------------------------------------------------------
coding_runner (to coding_assistant):
--------------------------------------------------------------------------------
coding_assistant (to coding_runner):
Calculate the percentage gain YTD for Berkshire Hathaway stock, save to
--------------------------------------------------------------------------------
coding_runner (to coding_assistant):
--------------------------------------------------------------------------------
coding_assistant (to coding_runner):
Calculate the percentage gain YTD for Berkshire Hathaway stock, save to
--------------------------------------------------------------------------------
Issue 2: Outputs being parsed incorrectly
- I am able to connect and make requests to a hugging face model, but it seems as though the outputs are being parsed incorrectly, possible preventing the use proxy from having any code to run. So it just goes in a loop of giving snippets of responses.
- I ran it in the terminal to ensure it wasn't just a jupyter notebook issue, same output.
- When i run the same code but with GPT-4 it works fine, leading me to believe that the outputs are being parsed incorrectly.
- The
work_dir
also isn't generated even though it's specified.
Code 👨💻
import os
import shutil
from pathlib import Path
import autogen
from autogen import AssistantAgent, UserProxyAgent
def clear_cache():
# Function for cleaning up cash to
# avoid potential spill of conversation
# between models
# Should be run before and after each chat initialization
folder_path = '.cache'
if os.path.exists(folder_path) and os.path.isdir(folder_path):
shutil.rmtree(folder_path)
clear_cache()
config_list = autogen.config_list_from_json(
env_or_file='OAI_CONFIG_LIST',
filter_dict={
"model": ["huggingface/mistralai/Mistral-7B-v0.1"],
},
)
print(config_list)
coding_assistant = AssistantAgent(
name="coding_assistant",
llm_config={
"request_timeout": 1000,
"seed": 42,
"config_list": config_list,
"temperature": 0.4,
},
)
coding_runner = UserProxyAgent(
name="coding_runner",
human_input_mode="NEVER",
max_consecutive_auto_reply=30,
# is_termination_msg = lambda x: x.get("message", "").rstrip().endswith("TERMINATE"),
code_execution_config={
"work_dir": "coding",
"use_docker": False
},
)
coding_runner.initiate_chat(coding_assistant, message="Calculate the percentage gain YTD for Berkshire Hathaway stock and plot a chart to linechart.png")
Output👾
[{'model': 'huggingface/mistralai/Mistral-7B-v0.1', 'api_key': 'hf_Y****'}]
coding_runner (to coding_assistant):
Calculate the percentage gain YTD for Berkshire Hathaway stock and plot a chart to linechart.png
--------------------------------------------------------------------------------
coding_assistant (to coding_runner):
.
1. Use the following code to get the stock data.
2. Use
--------------------------------------------------------------------------------
coding_runner (to coding_assistant):
--------------------------------------------------------------------------------
coding_assistant (to coding_runner):
2017-01-01 as the start date and 2020
--------------------------------------------------------------------------------
coding_runner (to coding_assistant):
--------------------------------------------------------------------------------
coding_assistant (to coding_runner):
01-01 as the end date.
3. Calculate the percentage gain Y
--------------------------------------------------------------------------------
coding_runner (to coding_assistant):
--------------------------------------------------------------------------------
coding_assistant (to coding_runner):
2017-01-01 to 2020-01-
--------------------------------------------------------------------------------
coding_runner (to coding_assistant):
--------------------------------------------------------------------------------
coding_assistant (to coding_runner):
1.
4. Save the result to a file called result.csv.
5
--------------------------------------------------------------------------------
coding_runner (to coding_assistant):
--------------------------------------------------------------------------------
coding_assistant (to coding_runner):
Terminate.
6.
7.
8.
9.
--------------------------------------------------------------------------------
coding_runner (to coding_assistant):
--------------------------------------------------------------------------------
coding_assistant (to coding_runner):
# filename: result.csv
import yfinance as yf
import pandas
--------------------------------------------------------------------------------
coding_runner (to coding_assistant):
--------------------------------------------------------------------------------
coding_assistant (to coding_runner):
as pd
import matplotlib.pyplot as plt
import numpy as np
--------------------------------------------------------------------------------
coding_runner (to coding_assistant):
--------------------------------------------------------------------------------
coding_assistant (to coding_runner):
from datetime import datetime
# get the stock data
ticker = 'BRK.
--------------------------------------------------------------------------------
coding_runner (to coding_assistant):
--------------------------------------------------------------------------------
coding_assistant (to coding_runner):
A'
start = datetime(2017, 1, 1)
--------------------------------------------------------------------------------
coding_runner (to coding_assistant):
--------------------------------------------------------------------------------
coding_assistant (to coding_runner):
end = datetime(2020, 1, 1)
data = y
--------------------------------------------------------------------------------
coding_runner (to coding_assistant):
- I went through the example notebooks again and tested loading configurations to see what the issue was, i think the problem is when you specify an open sourced model, it orders the dictionary differently within
config_list
- Perhaps this is messing up the order in which a chat between agents plays out, i'm not sure. :
"model": {
"gpt-4",
"gpt-3.5-turbo",
# "huggingface/mistralai/Mistral-7B-v0.1",
}
[{'model': 'gpt-4', 'api_key':'sk-n**************'}, {'model': 'gpt-3.5-turbo', 'api_key': 'sk-n**************'}]
With an open sourced model:
filter_dict={
"model": {
"gpt-4",
# "gpt-3.5-turbo",
"huggingface/mistralai/Mistral-7B-v0.1",
}
[{'model': 'huggingface/mistralai/Mistral-7B-v0.1', 'api_key': 'hf_Y**********'}, {'model': 'gpt-4', 'api_key': 'sk-*****'}]
@BeibinLi @sonichi @gagb @ishaan-jaff @krrishdholakia
I'm taking a look now. There could also be another issue with the tune call here, given its not always gonna be openai models
Issue 3: Tuning for model hyperparameters is hardcoded to OpenAi
- There could also be another issue with the tune call here, given its not always gonna be openai models. there needs to be a check here to only do this when OpenAI models are passed, or alternatively have a
default_search_space
for which model types
class ChatCompletion(Completion):
"""A class for OpenAI API ChatCompletion. Share the same API as Completion."""
default_search_space = Completion.default_search_space.copy()
default_search_space["model"] = tune.choice(["gpt-3.5-turbo", "gpt-4"])
openai_completion_class = not ERROR and openai.ChatCompletion
what are these parameters based on?
# completiion.py
default_search_space = {
"model": tune.choice(
[
"text-ada-001",
"text-babbage-001",
"text-davinci-003",
"gpt-3.5-turbo",
"gpt-4",
]
),
"temperature_or_top_p": tune.choice(
[
{"temperature": tune.uniform(0, 2)},
{"top_p": tune.uniform(0, 1)},
]
),
"max_tokens": tune.lograndint(50, 1000),
"n": tune.randint(1, 100),
"prompt": "{prompt}",
}
I'm going through a few things and i spotted a few issues:
Issue 1:
config_list_from_dotenv
config_list_from_dotenv
acts differently since the updates tocompletion.py
, it works when i have andOPENAI_API_KEY
in my.env
file, and still uses the hugging face model (with the bad outputs explained in issue 2)- When i don't have an
OPENAI_API_KEY
in my.env
file the, and i specify a hugging face model thecoding_assistant
goes into a continuos loop of stating the user input. (See output below.)- I've made sure to clear the cache after each run to ensure i'm not picking up any previous conversations mistakenly.
- As you'll see it will work with
config_list_from_json
so there appears to be something isolated toconfig_list_from_dotenv
which is causing the issue, as it depends onconfig_list_from_json
.Code👨💻 No Openai Key in .env file - opensourced model
Issue 2: Outputs being parsed incorrectly
- I am able to connect and make requests to a hugging face model, but it seems as though the outputs are being parsed incorrectly, possible preventing the use proxy from having any code to run. So it just goes in a loop of giving snippets of responses.
- I ran it in the terminal to ensure it wasn't just a jupyter notebook issue, same output.
- When i run the same code but with GPT-4 it works fine, leading me to believe that the outputs are being parsed incorrectly.
- The
work_dir
also isn't generated even though it's specified.Code 👨💻 Output👾
- I went through the example notebooks again and tested loading configurations to see what the issue was, i think the problem is when you specify an open sourced model, it orders the dictionary differently within
config_list
- Perhaps this is messing up the order in which a chat between agents plays out, i'm not sure. :"model": { "gpt-4", "gpt-3.5-turbo", # "huggingface/mistralai/Mistral-7B-v0.1", } [{'model': 'gpt-4', 'api_key':'sk-n**************'}, {'model': 'gpt-3.5-turbo', 'api_key': 'sk-n**************'}]
With an open sourced model:
filter_dict={ "model": { "gpt-4", # "gpt-3.5-turbo", "huggingface/mistralai/Mistral-7B-v0.1", } [{'model': 'huggingface/mistralai/Mistral-7B-v0.1', 'api_key': 'hf_Y**********'}, {'model': 'gpt-4', 'api_key': 'sk-*****'}]
@BeibinLi @sonichi @gagb @ishaan-jaff @krrishdholakia
I'm taking a look now. There could also be another issue with the tune call here, given its not always gonna be openai models
Issue 3: Tuning for model hyperparameters is hardcoded to OpenAi
- There could also be another issue with the tune call here, given its not always gonna be openai models. there needs to be a check here to only do this when OpenAI models are passed, or alternatively have a
default_search_space
for which model typesclass ChatCompletion(Completion): """A class for OpenAI API ChatCompletion. Share the same API as Completion.""" default_search_space = Completion.default_search_space.copy() default_search_space["model"] = tune.choice(["gpt-3.5-turbo", "gpt-4"]) openai_completion_class = not ERROR and openai.ChatCompletion
what are these parameters based on?
# completiion.py default_search_space = { "model": tune.choice( [ "text-ada-001", "text-babbage-001", "text-davinci-003", "gpt-3.5-turbo", "gpt-4", ] ), "temperature_or_top_p": tune.choice( [ {"temperature": tune.uniform(0, 2)}, {"top_p": tune.uniform(0, 1)}, ] ), "max_tokens": tune.lograndint(50, 1000), "n": tune.randint(1, 100), "prompt": "{prompt}", }
For Issue 2, I think it is because different models could use different representations (syntax) of messages and outputs. For some small models, I observed that they will not use code blocks (``` blocks) even when writing code, which could be a problem to solve complex tasks.
@sonichi there's a lot going on in this thread. Can you help me understand what are the blockers to merging this PR?
Happy to address them
I've confirmed it works locally for me
Hi Everyone! @ishaan-jaff @sonichi @qingyun-wu
Thanks so much for helping integrating LiteLLM into AutoGen! After closely reviewing the code, I'd like to kindly suggest a slight modification: we could utilize the "api_type" variable for OpenAI/LiteLLM.
For instance, we could modify the code:
response = litellm.completion(**config)
into:
api_type = config.get("api_type", None)
if api_type and re.sub(r'[^a-zA-Z0-9]', '', api_type).lower() == "litellm":
response = litellm.completion(**config)
else:
response = openai.completion(**config)
Subsequently, the configuration file could be updated to:
[
{
"model": "gpt-4",
"api_key": "<your OpenAI API key here>",
},
{
"model": "gpt-35-turbo",
"api_key": "<your Azure API key here>",
"api_type": "azure",
},
{
"model": "command-nightly",
"api_key": "<your Cohere API key here>",
"api_type": "Lite-llm"
},
{
"model": "palm/chat-bison",
"api_key": "<your PaLM API key here>",
"api_type": "LiteLLM"
}
]
By taking this approach, users could benefit from clearer readability regarding the API directly from the CONFIG file. Moreover, this modification might help them to control package versions better, particularly when OpenAI updates their API.
What do you think? Thank you for considering this suggestion.
Hi Everyone! @ishaan-jaff @sonichi @qingyun-wu
Thanks so much for helping integrating LiteLLM into AutoGen! After closely reviewing the code, I'd like to kindly suggest a slight modification: we could utilize the "api_type" variable for OpenAI/LiteLLM.
For instance, we could modify the code:
response = litellm.completion(**config)
into:
api_type = config.get("api_type", None) if api_type and re.sub(r'[^a-zA-Z0-9]', '', api_type).lower() == "litellm": response = litellm.completion(**config) else: response = openai.completion(**config)
Subsequently, the configuration file could be updated to:
[ { "model": "gpt-4", "api_key": "<your OpenAI API key here>", }, { "model": "gpt-35-turbo", "api_key": "<your Azure API key here>", "api_type": "azure", }, { "model": "command-nightly", "api_key": "<your Cohere API key here>", "api_type": "Lite-llm" }, { "model": "palm/chat-bison", "api_key": "<your PaLM API key here>", "api_type": "LiteLLM" } ]
By taking this approach, users could benefit from clearer readability regarding the API directly from the CONFIG file. Moreover, this modification might help them to control package versions better, particularly when OpenAI updates their API.
What do you think? Thank you for considering this suggestion.
+1 for keeping the choice of using openai. +@victordibia
@BeibinLi @sonichi I addressed your feedback and update the PR
let me know what else is missing
It will probably be a good idea to spend a considerable amount of time and effort figuring out whether or not this integration is even a good idea. At the very least its use should be completely optional - as an [extra] dependency, even better. I have only just begun reviewing the PR, and am not yet making any claims about it.
Has such a discussion taken place? Much scrutiny is warranted. Let's not pull a langchain.
It will probably be a good idea to spend a considerable amount of time and effort figuring out whether or not this integration is even a good idea. At the very least its use should be completely optional - as an [extra] dependency, even better. I have only just begun reviewing the PR, and am not yet making any claims about it.
Has such a discussion taken place? Much scrutiny is warranted. Let's not pull a langchain.
I agree that the dependency should be optional and only be used for models that do not support openai-compatible API.
@sonichi @BeibinLi @ishaan-jaff what are next steps on this PR?
Note: If you're trying to use non-openai models while this PR is pending, here's a tutorial
@sonichi @BeibinLi @ishaan-jaff what are next steps on this PR?
Could you make litellm an optional dependency and document for which models litellm are required? You can document that in https://microsoft.github.io/autogen/docs/Installation#optional-dependencies
Also, please address @TomExMachina 's comments in the PR about the exception.
@sonichi Thanks for the revert.
Here's what i understand:
Action Items
- Make litellm an optional dependency
- Document that in this section - https://microsoft.github.io/autogen/docs/Installation#optional-dependencies
Next Steps
- Upon completion of action items (stated above), this PR will be merged.
@sonichi @BeibinLi can you please confirm this. Clarity on final requirements would be appreciated.
Note: If you're trying to use non-openai models while this PR is pending, here's a tutorial
@sonichi Thanks for the revert.
Here's what i understand:
Action Items
- Make litellm an optional dependency
- Document that in this section - https://microsoft.github.io/autogen/docs/Installation#optional-dependencies
Next Steps
- Upon completion of action items (stated above), this PR will be merged.
@sonichi @BeibinLi can you please confirm this. Clarity on final requirements would be appreciated.
@TomExMachina do you have any concern about this plan?