ChatDev icon indicating copy to clipboard operation
ChatDev copied to clipboard

Add support for Ollama, Palm, Claude-2, Cohere, Replicate Llama2 CodeLlama (100+LLMs) - using LiteLLM

Open ishaan-jaff opened this issue 2 years ago • 34 comments

This PR adds support for the above mentioned LLMs using LiteLLM https://github.com/BerriAI/litellm/ LiteLLM is a lightweight package to simplify LLM API calls - use any llm as a drop in replacement for gpt-3.5-turbo.

Example

from litellm import completion

## set ENV variables
os.environ["OPENAI_API_KEY"] = "openai key"
os.environ["COHERE_API_KEY"] = "cohere key"

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion(model="command-nightly", messages)

# anthropic call
response = completion(model="claude-instant-1", messages=messages)

ishaan-jaff avatar Sep 13 '23 15:09 ishaan-jaff

@thinkwee @qianc62 can i get a review on this PR ?

happy to add docs/testing on this too if this initial commit looks good

ishaan-jaff avatar Sep 13 '23 15:09 ishaan-jaff

Thank you. Does litellm support more personalized parameters? such as temperature, top_n, etc,.

qianc62 avatar Sep 19 '23 03:09 qianc62

Thank you. Does litellm support more personalized parameters? such as temperature, top_n, etc,.

Yes, the following temperature example shifts probability. The request body spec does include top_p as well.

import os
from litellm import completion

os.environ["OPENAI_API_KEY"] = ""
os.environ["OPENAI_API_BASE"] = "https://api.openai.com/v1"
os.environ["MODEL"] = "gpt-3.5-turbo"

response = completion(
    model = os.getenv('MODEL'),
    messages = [{ "content": "The sky is", "role": "user" }],
    temperature = 0.8,
    max_tokens = 80,
    api_base = os.getenv('OPENAI_API_BASE'),
    request_timeout = 300,
)

abbott avatar Sep 19 '23 06:09 abbott

Stoked to see this PR get merged!

arch1v1st avatar Oct 02 '23 20:10 arch1v1st

bump @ishaan-jaff

krrishdholakia avatar Oct 04 '23 23:10 krrishdholakia

Thank you. Does litellm support more personalized parameters? such as temperature, top_n, etc,.

@qianc62 yes we support all params OpenAI supports + we allow you to pass provider specific params if necessary more info here: https://docs.litellm.ai/docs/completion/input

ishaan-jaff avatar Oct 05 '23 23:10 ishaan-jaff

@qianc62 any blockers to merging ? anything you need from me ?

ishaan-jaff avatar Oct 05 '23 23:10 ishaan-jaff

Couple things to update here. I got my Mistral 7B models to work with LiteLLM (+ Ollama).

First problem: I needed to ignore OPEN_AI_API_KEY by setting it to some arbitrary value.

Second problem: ChatDev was sending too many arguments to the Ollama which I handled with: import litellm litellm.drop_params = True

Third problem: As I don't know how to create a real model class for the LiteLLM models with all required information, I just used GPT_3_5_TURBO as my model but then in the model_backend.py I replaced the response with: response = litellm.completion(*args, **kwargs, model="ollama/my_local_model", api_base="http://localhost:11434", **self.model_config_dict)

Fourth (bigger problem) I encountered: LiteLLM's OpenAI API seems to be newer version than ChatDev's, which causes response (completion) to return "logprobs" inside the "choises list" back to the ChatDev which then causes multiple errors as ChatDev doesn't support logprobs. With a crude hack (removing the "logprobs" from the response) I managed to get past this error.

Anyway here is the early chat with my Mistral 7B (Chief Product Officer) writing some crude code for my request.

image

venim1103 avatar Oct 10 '23 19:10 venim1103

Hey @venim1103 did the proxy not work for you?

krrishdholakia avatar Oct 10 '23 20:10 krrishdholakia

I am extremely interested in this PR

milorddev avatar Oct 11 '23 01:10 milorddev

Hey @venim1103 i've filed your issue re: logprobs. I'll make sure we have a fix for this on our (litellm) end.

Extremely sorry for the frustration that must've caused.

krrishdholakia avatar Oct 11 '23 01:10 krrishdholakia

@ishaan-jaff

https://github.com/OpenBMB/ChatDev/pull/53#issue-1894791424

Wait, we don't need to change openai_api_base to local url?

yhyu13 avatar Oct 11 '23 06:10 yhyu13

@krrishdholakia Thank you! As I only tried to get things running as fast as possible (hacking things together) I didn't test any proxy, I just hard coded my local model name (that I made using Ollama) into the "response request". When I was using AutoGen with LiteLLM I just had to put all the model info to OAI_CONFIG_LIST, (like the "model", "api_base" and "api_type") but in ChatDev I didn't know how or where to put all this info so I just did that hack for now...

Anyway my initial testing with Mistral 7B model has some issues (the model itself doesn't really understand the "<INFO" context and is mostly too chatty or starts changing the subject too early thus not moving trough the process).

venim1103 avatar Oct 11 '23 07:10 venim1103

Hey guys so here is a list of changes I made to get it up and running with a self-hosted llm (i.e. hf text-generation-inference).

litellm-changes.diff

However I need help if someone could replicate my issues. I built chatdev inside a docker container file provide: Dockerfile.txt

when I run everything with networking turned on in the docker container everything works fine as it should. However when I isolate the self-hosted llm and the docker container to it's own docker isolated network, things start to break. I don't know if the issue is with litellm or chatdev. I narrowed it down I think to the usage of tiktoken but because the code has a lot of try/except it's hard to find out where the failure is happing because it's a 'silent failure' so it's hard to spot. Any help would be appreciated.

the .log error only says this: [2023-12-10 16:39:09 WARNING] expected string or buffer, retrying in 0 seconds...

[UPDATE] I think the issue could be in my changes to this line, currently troubleshooting :-/ #output_messages = [ChatMessage(role_name=self.role_name, role_type=self.role_type, meta_dict=dict(), **dict(choice["message"])) for choice in response["choices"]]

output_messages = [ChatMessage(role_name=self.role_name, role_type=self.role_type, meta_dict=dict(), **{k: v for k, v in choice["message"].items() if k != "logprobs"}) for choice in response["choices"]]

But I don't understand how going 'offline' changes this?

[SECOND UPDATE] since I am running in a container when I run the app it looks like there is a library trying to reach the internet and that is where things are tripping up. something called zeet-berri.zeet.app? IPs resoved to amazonaws. ec2-52-37-239-96.us-west-2.compute.amazonaws.com ec2-35-86-16-11.us-west-2.compute.amazonaws.com

Running on a connected container! ss -atp|grep -i slirp4netns ESTAB 0 0 192.168.1.169:37824 52.37.239.96:https users:(("slirp4netns",pid=949054,fd=10)) ESTAB 0 0 10.10.10.1:33630 10.10.10.2:webcache users:(("slirp4netns",pid=949054,fd=13)) ESTAB 0 0 192.168.1.169:51482 35.86.16.11:https users:(("slirp4netns",pid=949054,fd=12))

Running on an 'isolated' container! ss -anput |grep slirp4netns udp UNCONN 0 0 0.0.0.0:37887 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=7))
udp UNCONN 0 0 0.0.0.0:46187 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=4))
udp UNCONN 0 0 0.0.0.0:46274 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=10)) udp UNCONN 0 0 0.0.0.0:50229 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=3))
udp UNCONN 0 0 0.0.0.0:54771 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=9))
udp UNCONN 0 0 0.0.0.0:59316 0.0.0.0:* users:(("slirp4netns",pid=1189494,fd=8))

Can anyone tell me what is going on here and if there an environmental variable I can set to avoid this issue. This might be a metrics collection thing from litellm? Don't know!

[SOVLED] It looks like there is a bug in litellm. I updated to latest version and added these to the model_backend.py import litellm litellm.set_verbose=False litellm.drop_params=False litellm.telemetry = False litellm.utils.logging.raiseExceptions =False

Also modified mistral prompt with litellm and things started working perfectly.

cielonet avatar Oct 12 '23 16:10 cielonet

@cielonet i'm the maintainer of litellm. can't see the exact issue you faced. Is this because we raise errors for unmapped params?

krrishdholakia avatar Oct 13 '23 22:10 krrishdholakia

some context would be helpful - i'd like to get this fixed on our end asap.

krrishdholakia avatar Oct 13 '23 22:10 krrishdholakia

@krrishdholakia No prob. I'm currently out of town and will be back on Monday. I'll repost the error msg I was getting. It looked to me like the msg "expected string or buffer" was a msg generated by litellm because a value (I think it was part of the logging key) in the api call was not correctly formatted. When I ran it with raiseExceptions=False the api calls never sent that particular field and the system started working again. I did use the logging http copy/paste so if you have access to the logs/feedback people submit you should see mine from Thursday when I was working on this (e.g. focus on looking for "expected string or buffer") Anyways like I said I will be back Monday and will provide more feedback. I suggest adding a timeout to your telemetry as well if internet is not avaiable because otherwise it freezes this system and it as a pain to figure out that the telemetry was causing everything to pause until it finds an internet connection. :-/ Thanks again.

cielonet avatar Oct 14 '23 18:10 cielonet

how about the PR?

noahnoahk avatar Oct 16 '23 11:10 noahnoahk

I've tried those changes locally and trying to run the code with azure openai service doesn't seem to work. I'll let you know if I get it to function.

OhNotWilliam avatar Oct 16 '23 12:10 OhNotWilliam

@OhNotWilliam we don't log any of the responses - it's all client-side (even the web url you saw was just an encoded url string). If you have the traceback, please let me know - happy to help debug.

krrishdholakia avatar Oct 16 '23 14:10 krrishdholakia

We've also had people running this via the local OpenAI-proxy - https://docs.litellm.ai/docs/proxy_server

krrishdholakia avatar Oct 16 '23 14:10 krrishdholakia

I've tried those changes locally and trying to run the code with azure openai service doesn't seem to work. I'll let you know if I get it to function.

@OhNotWilliam: Check my PR #192, which gets Azure working

dnhkng avatar Oct 18 '23 05:10 dnhkng

Any movement on getting this PR merged?

sammcj avatar Nov 26 '23 00:11 sammcj

Where do we stand on this? What is still outstanding/how can I help?

dsnid3r avatar Dec 12 '23 15:12 dsnid3r

@ishaan-jaff

dsnid3r avatar Dec 15 '23 16:12 dsnid3r

Hi, is this still open ? Very confused

nobodykr avatar Dec 15 '23 20:12 nobodykr

Any update on when this will be implemented?

ChieF-TroN avatar Jan 09 '24 05:01 ChieF-TroN

Ollama annouced OpenAI compability making LiteLLM irrelevant https://ollama.com/blog/openai-compatibility

TGM avatar Feb 11 '24 22:02 TGM

@TGM thank you for the headsup. I think this is great. thank you .

nobodykr avatar Feb 12 '24 02:02 nobodykr

If anyone has any documentation on clearly how to implement this what is provided in the title of this issue/PR please provide with that . thanks a lot. Happy coding .

hemangjoshi37a avatar Feb 20 '24 12:02 hemangjoshi37a