Add support for Ollama, Palm, Claude-2, Cohere, Replicate Llama2 CodeLlama (100+LLMs) - using LiteLLM
This PR adds support for the above mentioned LLMs using LiteLLM https://github.com/BerriAI/litellm/ LiteLLM is a lightweight package to simplify LLM API calls - use any llm as a drop in replacement for gpt-3.5-turbo.
Example
from litellm import completion
## set ENV variables
os.environ["OPENAI_API_KEY"] = "openai key"
os.environ["COHERE_API_KEY"] = "cohere key"
messages = [{ "content": "Hello, how are you?","role": "user"}]
# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)
# cohere call
response = completion(model="command-nightly", messages)
# anthropic call
response = completion(model="claude-instant-1", messages=messages)
@thinkwee @qianc62 can i get a review on this PR ?
happy to add docs/testing on this too if this initial commit looks good
Thank you. Does litellm support more personalized parameters? such as temperature, top_n, etc,.
Thank you. Does litellm support more personalized parameters? such as temperature, top_n, etc,.
Yes, the following temperature example shifts probability. The request body spec does include top_p as well.
import os
from litellm import completion
os.environ["OPENAI_API_KEY"] = ""
os.environ["OPENAI_API_BASE"] = "https://api.openai.com/v1"
os.environ["MODEL"] = "gpt-3.5-turbo"
response = completion(
model = os.getenv('MODEL'),
messages = [{ "content": "The sky is", "role": "user" }],
temperature = 0.8,
max_tokens = 80,
api_base = os.getenv('OPENAI_API_BASE'),
request_timeout = 300,
)
Stoked to see this PR get merged!
bump @ishaan-jaff
Thank you. Does litellm support more personalized parameters? such as temperature, top_n, etc,.
@qianc62 yes we support all params OpenAI supports + we allow you to pass provider specific params if necessary more info here: https://docs.litellm.ai/docs/completion/input
@qianc62 any blockers to merging ? anything you need from me ?
Couple things to update here. I got my Mistral 7B models to work with LiteLLM (+ Ollama).
First problem: I needed to ignore OPEN_AI_API_KEY by setting it to some arbitrary value.
Second problem: ChatDev was sending too many arguments to the Ollama which I handled with:
import litellm
litellm.drop_params = True
Third problem: As I don't know how to create a real model class for the LiteLLM models with all required information, I just used GPT_3_5_TURBO as my model but then in the model_backend.py I replaced the response with:
response = litellm.completion(*args, **kwargs, model="ollama/my_local_model", api_base="http://localhost:11434", **self.model_config_dict)
Fourth (bigger problem) I encountered: LiteLLM's OpenAI API seems to be newer version than ChatDev's, which causes response (completion) to return "logprobs" inside the "choises list" back to the ChatDev which then causes multiple errors as ChatDev doesn't support logprobs. With a crude hack (removing the "logprobs" from the response) I managed to get past this error.
Anyway here is the early chat with my Mistral 7B (Chief Product Officer) writing some crude code for my request.
Hey @venim1103 did the proxy not work for you?
I am extremely interested in this PR
Hey @venim1103 i've filed your issue re: logprobs. I'll make sure we have a fix for this on our (litellm) end.
Extremely sorry for the frustration that must've caused.
@ishaan-jaff
https://github.com/OpenBMB/ChatDev/pull/53#issue-1894791424
Wait, we don't need to change openai_api_base to local url?
@krrishdholakia Thank you! As I only tried to get things running as fast as possible (hacking things together) I didn't test any proxy, I just hard coded my local model name (that I made using Ollama) into the "response request". When I was using AutoGen with LiteLLM I just had to put all the model info to OAI_CONFIG_LIST, (like the "model", "api_base" and "api_type") but in ChatDev I didn't know how or where to put all this info so I just did that hack for now...
Anyway my initial testing with Mistral 7B model has some issues (the model itself doesn't really understand the "<INFO" context and is mostly too chatty or starts changing the subject too early thus not moving trough the process).
Hey guys so here is a list of changes I made to get it up and running with a self-hosted llm (i.e. hf text-generation-inference).
However I need help if someone could replicate my issues. I built chatdev inside a docker container file provide: Dockerfile.txt
when I run everything with networking turned on in the docker container everything works fine as it should. However when I isolate the self-hosted llm and the docker container to it's own docker isolated network, things start to break. I don't know if the issue is with litellm or chatdev. I narrowed it down I think to the usage of tiktoken but because the code has a lot of try/except it's hard to find out where the failure is happing because it's a 'silent failure' so it's hard to spot. Any help would be appreciated.
the .log error only says this: [2023-12-10 16:39:09 WARNING] expected string or buffer, retrying in 0 seconds...
[UPDATE] I think the issue could be in my changes to this line, currently troubleshooting :-/ #output_messages = [ChatMessage(role_name=self.role_name, role_type=self.role_type, meta_dict=dict(), **dict(choice["message"])) for choice in response["choices"]]
output_messages = [ChatMessage(role_name=self.role_name, role_type=self.role_type, meta_dict=dict(), **{k: v for k, v in choice["message"].items() if k != "logprobs"}) for choice in response["choices"]]
But I don't understand how going 'offline' changes this?
[SECOND UPDATE] since I am running in a container when I run the app it looks like there is a library trying to reach the internet and that is where things are tripping up. something called zeet-berri.zeet.app? IPs resoved to amazonaws. ec2-52-37-239-96.us-west-2.compute.amazonaws.com ec2-35-86-16-11.us-west-2.compute.amazonaws.com
Running on a connected container! ss -atp|grep -i slirp4netns ESTAB 0 0 192.168.1.169:37824 52.37.239.96:https users:(("slirp4netns",pid=949054,fd=10)) ESTAB 0 0 10.10.10.1:33630 10.10.10.2:webcache users:(("slirp4netns",pid=949054,fd=13)) ESTAB 0 0 192.168.1.169:51482 35.86.16.11:https users:(("slirp4netns",pid=949054,fd=12))
Running on an 'isolated' container!
ss -anput |grep slirp4netns
udp UNCONN 0 0 0.0.0.0:37887 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=7))
udp UNCONN 0 0 0.0.0.0:46187 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=4))
udp UNCONN 0 0 0.0.0.0:46274 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=10))
udp UNCONN 0 0 0.0.0.0:50229 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=3))
udp UNCONN 0 0 0.0.0.0:54771 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=9))
udp UNCONN 0 0 0.0.0.0:59316 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=8))
Can anyone tell me what is going on here and if there an environmental variable I can set to avoid this issue. This might be a metrics collection thing from litellm? Don't know!
[SOVLED] It looks like there is a bug in litellm. I updated to latest version and added these to the model_backend.py import litellm litellm.set_verbose=False litellm.drop_params=False litellm.telemetry = False litellm.utils.logging.raiseExceptions =False
Also modified mistral prompt with litellm and things started working perfectly.
@cielonet i'm the maintainer of litellm. can't see the exact issue you faced. Is this because we raise errors for unmapped params?
some context would be helpful - i'd like to get this fixed on our end asap.
@krrishdholakia No prob. I'm currently out of town and will be back on Monday. I'll repost the error msg I was getting. It looked to me like the msg "expected string or buffer" was a msg generated by litellm because a value (I think it was part of the logging key) in the api call was not correctly formatted. When I ran it with raiseExceptions=False the api calls never sent that particular field and the system started working again. I did use the logging http copy/paste so if you have access to the logs/feedback people submit you should see mine from Thursday when I was working on this (e.g. focus on looking for "expected string or buffer") Anyways like I said I will be back Monday and will provide more feedback. I suggest adding a timeout to your telemetry as well if internet is not avaiable because otherwise it freezes this system and it as a pain to figure out that the telemetry was causing everything to pause until it finds an internet connection. :-/ Thanks again.
how about the PR?
I've tried those changes locally and trying to run the code with azure openai service doesn't seem to work. I'll let you know if I get it to function.
@OhNotWilliam we don't log any of the responses - it's all client-side (even the web url you saw was just an encoded url string). If you have the traceback, please let me know - happy to help debug.
We've also had people running this via the local OpenAI-proxy - https://docs.litellm.ai/docs/proxy_server
I've tried those changes locally and trying to run the code with azure openai service doesn't seem to work. I'll let you know if I get it to function.
@OhNotWilliam: Check my PR #192, which gets Azure working
Any movement on getting this PR merged?
Where do we stand on this? What is still outstanding/how can I help?
@ishaan-jaff
Hi, is this still open ? Very confused
Any update on when this will be implemented?
Ollama annouced OpenAI compability making LiteLLM irrelevant https://ollama.com/blog/openai-compatibility
@TGM thank you for the headsup. I think this is great. thank you .
If anyone has any documentation on clearly how to implement this what is provided in the title of this issue/PR please provide with that . thanks a lot. Happy coding .