Basic example with a remote LiteLLM
Hi,
I'm trying to run the basic example using LiteLLM with ollama2 model running in a container pulling from: https://hub.docker.com/r/litellm/ollama but I don't get the to reach the LLM. It seems the request is created as http://127.0.0.1:8000/api/generate but the LLM is not reached properly and returns 404.
See more details below.
Any idea?
Thanks!
More details:
container is started with: docker run -p 8000:8000 --name ollama litellm/ollama
I ran a part of an autogen demo and the litellm container seems to work, I mean the server is reached and the llm server replies. Then autogen fails in another step, so I am trying CrewAI instead.
I am trying to run the CrewAI example as described in the README.md, these are the changes I made:
from crewai import Agent, Task, Crew, Process
from langchain.schema import HumanMessage
from langchain_community.chat_models import ChatLiteLLM
from litellm import litellm
ollama_chatlitellm = ChatLiteLLM(model="ollama/ollama2",api_base="http://127.0.0.1:8000", api_type="open_ai", api_key="")
researcher = Agent(
role='Researcher',
goal='Discover new insights',
backstory="You're a world class researcher working on a major data science company",
verbose=True,
allow_delegation=False,
llm=ollama_chatlitellm,
debug_mode=True
)
# ... the rest is the same as in the README.md
Client logs:
Working Agent: Researcher
Starting Task: Investigate the latest AI trends ...
> Entering new AgentExecutor chain...
kwargs[caching]: False; litellm.cache: None
LiteLLM completion() model= ollama2; provider = ollama
LiteLLM: Params passed to completion() {'functions': None, 'function_call': None, 'temperature': 1, 'top_p': None, 'stream': False, 'max_tokens': 256, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None, 'logprobs': None, 'top_logprobs': None, 'custom_llm_provider': 'ollama', 'model': 'ollama2', 'n': 1, 'stop': ['\nObservation']}
LiteLLM: Non-Default params passed to completion() {'temperature': 1, 'stream': False, 'max_tokens': 256, 'n': 1, 'stop': ['\nObservation']}
self.optional_params: {'num_predict': 256, 'temperature': 1, 'stop_sequences': ['\nObservation']}
PRE-API-CALL ADDITIONAL ARGS: {'api_base': 'http://127.0.0.1:8000/api/generate', 'complete_input_dict': {'model': 'ollama2', 'prompt': "You are Researcher.\nYou're a world class researcher working on a major data science company\n\nYour personal goal is: Discover new insights\n\nTOOLS:\n------\nYou have access to the following tools:\n\n\n\nTo use a tool, please use the exact following format:\n\n```\nThought: Do I need to use a tool? Yes\nAction: the action to take, should be one of [], just the name.\nAction Input: the input to the action\nObservation: the result of the action\n```\n\nWhen you have a response for your task, or if you do not need to use a tool, you MUST use the format:\n\n```\nThought: Do I need to use a tool? No\nFinal Answer: [your response here]\n```This is the summary of your work so far:\nBegin! This is VERY important to you, your job depends on it!\n\nCurrent Task: Investigate the latest AI trends\n", 'num_predict': 256, 'temperature': 1, 'stop_sequences': ['\nObservation'], 'stream': False}, 'headers': {}, 'acompletion': False}
POST Request Sent from LiteLLM:
curl -X POST \
http://127.0.0.1:8000/api/generate \
-d '{'model': 'ollama2', 'prompt': "You are Researcher.\nYou're a world class researcher working on a major data science company\n\nYour personal goal is: Discover new insights\n\nTOOLS:\n------\nYou have access to the following tools:\n\n\n\nTo use a tool, please use the exact following format:\n\n```\nThought: Do I need to use a tool? Yes\nAction: the action to take, should be one of [], just the name.\nAction Input: the input to the action\nObservation: the result of the action\n```\n\nWhen you have a response for your task, or if you do not need to use a tool, you MUST use the format:\n\n```\nThought: Do I need to use a tool? No\nFinal Answer: [your response here]\n```This is the summary of your work so far:\nBegin! This is VERY important to you, your job depends on it!\n\nCurrent Task: Investigate the latest AI trends\n", 'num_predict': 256, 'temperature': 1, 'stop_sequences': ['\nObservation'], 'stream': False}'
Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.
Logging Details: logger_fn - None | callable(logger_fn) - False
Logging Details LiteLLM-Failure Call
self.failure_callback: []
Retrying langchain_community.chat_models.litellm.ChatLiteLLM.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIConnectionError: {"detail":"Not Found"}.
kwargs[caching]: False; litellm.cache: None
LiteLLM completion() model= ollama2; provider = ollama
LiteLLM: Params passed to completion() {'functions': None, 'function_call': None, 'temperature': 1, 'top_p': None, 'stream': False, 'max_tokens': 256, 'presence_penalty': None, 'frequency_penalty': None, 'logit_bias': None, 'user': None, 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': None, 'logprobs': None, 'top_logprobs': None, 'custom_llm_provider': 'ollama', 'model': 'ollama2', 'n': 1, 'stop': ['\nObservation']}
LiteLLM: Non-Default params passed to completion() {'temperature': 1, 'stream': False, 'max_tokens': 256, 'n': 1, 'stop': ['\nObservation']}
self.optional_params: {'num_predict': 256, 'temperature': 1, 'stop_sequences': ['\nObservation']}
PRE-API-CALL ADDITIONAL ARGS: {'api_base': 'http://127.0.0.1:8000/api/generate', 'complete_input_dict': {'model': 'ollama2', 'prompt': "You are Researcher.\nYou're a world class researcher working on a major data science company\n\nYour personal goal is: Discover new insights\n\nTOOLS:\n------\nYou have access to the following tools:\n\n\n\nTo use a tool, please use the exact following format:\n\n```\nThought: Do I need to use a tool? Yes\nAction: the action to take, should be one of [], just the name.\nAction Input: the input to the action\nObservation: the result of the action\n```\n\nWhen you have a response for your task, or if you do not need to use a tool, you MUST use the format:\n\n```\nThought: Do I need to use a tool? No\nFinal Answer: [your response here]\n```This is the summary of your work so far:\nBegin! This is VERY important to you, your job depends on it!\n\nCurrent Task: Investigate the latest AI trends\n", 'num_predict': 256, 'temperature': 1, 'stop_sequences': ['\nObservation'], 'stream': False}, 'headers': {}, 'acompletion': False}
Server Logs
INFO: 172.17.0.1:61256 - "POST /api/generate HTTP/1.1" 404 Not Found
INFO: 172.17.0.1:61258 - "POST /api/generate HTTP/1.1" 404 Not Found
INFO: 172.17.0.1:61260 - "POST /api/generate HTTP/1.1" 404 Not Found
...
can you first test your LiteLLM with ollama2 model setup to make sure it works by itself? maybe it gives you a curl command to run to test the api endpoint ?
Thanks @fxtoofaan I decide to run crewAI directly with ollama without using LiteLLM and it worked. I followed ollama docker image instructions:
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollamadocker exec -it ollama ollama run llama2
Best,