OpenHands Use llama2 locally, keep requesting '/chat/completions' got 404 in ollama serve.

Describe the bug

I use llama2, send 'hello' in frontend and saw keep requesting '/api/embeddings' with httpstate 200, mingled with request '/chat/completions' with httpstate 404 on ollama serve. And i saw 99 steps log output on backend serve.

Setup and configuration

Current version:

commit e9121b78fed0b5ef36718ca0bf59588c0b094b86 (HEAD -> main)
Author: Xingyao Wang <[email protected]>
Date:   Sun Apr 7 16:07:59 2024 +0800

use .getLogger to avoid same logging message to get printed twice (#850)

My config.toml and environment vars (be sure to redact API keys):

LLM Model name: ollama/llama2
LLM API key: ''
LLM Base URL: localhost:11434
LLM Embedding Model: llama2
local model URL: localhost:11434
workspace: ./workspace

notice: i use real ip rather than localhost to solve communication problems between win10 and WSL2

My model and agent (you can see these settings in the UI):

Model:ollama/llama2
Agent: MonologueAgent

Commands I ran to install and run OpenDevin:

make setup-config
make start-backend
make start-frontend

Steps to Reproduce: 1.set config 2.start backend ,frontend and ollama serve 3.input 'hello' on frontend and send

Logs, error messages, and screenshots: Traceback (most recent call last): File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1437, in function_with_retries response = original_function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 387, in _completion raise e File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 335, in _completion deployment = self.get_available_deployment( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 2443, in get_available_deployment raise ValueError(f"No healthy deployment available, passed model={model}") ValueError: No healthy deployment available, passed model=gpt-3.5-turbo-1106

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/user/OpenDevin/agenthub/monologue_agent/utils/monologue.py", line 70, in condense resp = llm.completion(messages=messages) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/OpenDevin/opendevin/llm/llm.py", line 58, in wrapper resp = completion_unwrapped(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 329, in completion raise e File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 326, in completion response = self.function_with_fallbacks(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1420, in function_with_fallbacks raise original_exception File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1345, in function_with_fallbacks response = self.function_with_retries(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1497, in function_with_retries raise e File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1463, in function_with_retries response = original_function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 387, in _completion raise e File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 335, in _completion deployment = self.get_available_deployment( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 2443, in get_available_deployment raise ValueError(f"No healthy deployment available, passed model={model}") ValueError: No healthy deployment available, passed model=gpt-3.5-turbo-1106

ERROR: Error condensing thoughts: No healthy deployment available, passed model=gpt-3.5-turbo-1106

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

Traceback (most recent call last): File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/llms/openai.py", line 414, in completion raise e File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/llms/openai.py", line 373, in completion response = openai_client.chat.completions.create(**data, timeout=timeout) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/openai/_utils/_utils.py", line 275, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 667, in create return self._post( ^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/openai/_base_client.py", line 1213, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/openai/_base_client.py", line 902, in request return self._request( ^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/openai/_base_client.py", line 993, in _request raise self._make_status_error_from_response(err.response) from None openai.NotFoundError: 404 page not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/main.py", line 997, in completion raise e File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/main.py", line 970, in completion response = openai_chat_completions.completion( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/llms/openai.py", line 420, in completion raise OpenAIError(status_code=e.status_code, message=str(e)) litellm.llms.openai.OpenAIError: 404 page not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/user/OpenDevin/agenthub/monologue_agent/utils/monologue.py", line 70, in condense resp = llm.completion(messages=messages) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/user/OpenDevin/opendevin/llm/llm.py", line 58, in wrapper resp = completion_unwrapped(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 329, in completion raise e File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 326, in completion response = self.function_with_fallbacks(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1420, in function_with_fallbacks raise original_exception File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1345, in function_with_fallbacks response = self.function_with_retries(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1497, in function_with_retries raise e File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1463, in function_with_retries response = original_function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 387, in _completion raise e File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 370, in _completion response = litellm.completion( ^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/utils.py", line 2947, in wrapper raise e File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/utils.py", line 2845, in wrapper result = original_function(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/main.py", line 2129, in completion raise exception_type( ^^^^^^^^^^^^^^^ File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/utils.py", line 8526, in exception_type raise e File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/utils.py", line 7344, in exception_type raise NotFoundError( litellm.exceptions.NotFoundError: OpenAIException - 404 page not found

ERROR: Error condensing thoughts: OpenAIException - 404 page not found

Additional Context

use WSL2 on win10

Apr 12 '24 13:04 2868151647

ValueError: No healthy deployment available, passed model=gpt-3.5-turbo-1106

seems LLM_MODEL is not configured correcting

--

My config.toml and environment vars (be sure to redact API keys):
LLM Model

Underscores are there?

Apr 12 '24 14:04 SmartManoj

@SmartManoj sorry, my description problem. i see config.toml, is right cut1

Apr 12 '24 15:04 2868151647

I recommend using a litellm proxy to the ollama server, as implementation is buggy. Here is an example config:

LLM_API_KEY="ollama"
LLM_BASE_URL="http://localhost:4000"
LLM_MODEL="ollama/dolphin"
LLM_EMBEDDING_MODEL="llama"
WORKSPACE_DIR="./workspace"
MAX_ITERATIONS=100

with litellm server:

litellm --model ollama/dolphin --api_base http://localhost:11434

Apr 12 '24 16:04 dproworld

I did this but I keep getting

Oops. Something went wrong: Invalid \escape: line 2 column 18 (char 19)

on the front end and

ERROR: Invalid \escape: line 2 column 18 (char 19) Traceback (most recent call last): File "/home/meng/OpenDevin/opendevin/controller/agent_controller.py", line 135, in step action = self.agent.step(self.state) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/meng/OpenDevin/agenthub/planner_agent/agent.py", line 44, in step action = parse_response(action_resp) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/meng/OpenDevin/agenthub/planner_agent/prompt.py", line 224, in parse_response action_dict = json.loads(response) ^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/json/init.py", line 346, in loads return _default_decoder.decode(s) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/json/decoder.py", line 353, in raw_decode obj, end = self.scan_once(s, idx) ^^^^^^^^^^^^^^^^^^^^^^ json.decoder.JSONDecodeError: Invalid \escape: line 2 column 18 (char 19)

on the backend. what may be the issue?

LLM_API_KEY="ollama"
LLM_BASE_URL="http://localhost:4000"
LLM_MODEL="ollama/dolphin"
LLM_EMBEDDING_MODEL="llama"
WORKSPACE_DIR="./workspace"
MAX_ITERATIONS=100

with litellm server:

litellm --model ollama/dolphin --api_base http://localhost:11434

Apr 12 '24 17:04 menguzat

I think is no need to proxy ollama serve for me, i request and got response already. I did a network bridge between wsl2 and win10 and changed the ip address so that they were on the same network segment. I think we used different methods to achieve the same goal.

Apr 13 '24 01:04 2868151647

@menguzat File "/home/meng/OpenDevin/agenthub/planner_agent/agent.py", line 43, add print(action_resp) Error due to low quality of the model. Check out Gemini 1.5 pro

Apr 13 '24 01:04 SmartManoj

Hmmm... I was trying out mistral instruct. I really want to use this with local llms so I can tinker with it without worrying about costs. Any models to recommend?

@menguzat File "/home/meng/OpenDevin/agenthub/planner_agent/agent.py", line 43, add print(action_resp) Error due to low quality of the model. Check out Gemini 1.5 pro

Apr 13 '24 09:04 menguzat

Gemini 1.5 pro is free until May 2?

Apr 13 '24 10:04 SmartManoj

I generally have seen this 404 error when the model is set to something unavailable

Apr 21 '24 19:04 rbren

@rbren can you please share what you've done for llama3, On this discussion, it seems you stated it should work?

The settings for the client frontend at port 3000 only has ollama/llama2 and previous versions listed in settings, with gpt 3.5-turbo as default, can't tell from where this list is being retrieved yet.

I also looked through the code and there were no 'llama3' strings but 'llama2' as present, which are generally needed as a model name on the requests, but it might be that the env variables do that part for us...

It kinda surprises me that the OpenDevin client doesn't just reassure the user that the client-server has been secured for as a part of the prep for further input e.g. just ping the show endpoint http://localhost:11434/api/show of your ollama container endpoint with request:

{
  "name": "llama2"
 }

as shown here in the api docs. Again, you'd have to state llama3 instead if it applies. Again, the issue of llama3 not listed in the frontend settings applies here (although I have a feeling it's fetched from a remote URL as there is a delay ging from empty to a populated dropdown)

@SmartManoj maybe try the following to confirm connectivity? Container to Container ping

docker exec -it <your-client-container> ping <your-ollama-container-name>

Or make the curl from one container to the other

docker exec -it <your-client-container> curl -X POST -H "Content-Type: application/json" -d '{
  "name": "llama2"
}' http://<your-ollama-container-name>:<port>/<endpoint>

Two possibilities here I just have to guess on due to fact I don't have enough time to wade through the code atm

That the requests are made from browser client to ollama, not from within the opendevin server/container, and my advice doesn't apply, but you can just use Postman anyway.
That requests are from within the server/container, so my advice above will apply. Please note that the hostname may not be localhost anymore, since your docker containers are using the internal DNS, so you would need to use the container name <your-ollama-container-name> instead of localhostas I had shown.

Here's what I used for my shell script, the only editing (except for settings dropdown in frontend UI) I had to do to run (and yet still had that Nonetype request attribute error repeating itself forever)

export WORKSPACE_DIR=$(pwd)/workspace
docker run \
    --add-host host.docker.internal=host-gateway \
    -e LLM_API_KEY="11111111111111111111" \
    -e WORKSPACE_DIR="workspace" \
    -e LLM_BASE_URL="http://localhost:11434" \
    -e LLM_MODEL="ollama/llama2" \
    -e LLM_EMBEDDING_MODEL="llama2" \
    -e WORKSPACE_MOUNT_PATH=$WORKSPACE_DIR \
    -v $WORKSPACE_DIR:/opt/workspace_base \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -p 3000:3000 \
    ghcr.io/opendevin/opendevin:main

Shame I can't use OpenDevin yet, but I wanna thank you guys for your great work, looking forward to being a future user someday.

Apr 22 '24 12:04 Aeonitis

Did you check this to use without docker?

Apr 22 '24 12:04 SmartManoj

I saw it, but I wasn't interested in working without dockerized containers, thanks

Apr 22 '24 12:04 Aeonitis

Used this command? docker exec -it opendevin python opendevin/main.py -d /workspace -t "write bash script to print 5"

Apr 22 '24 13:04 SmartManoj

@Aeonitis to be clear--I have not used llama3.

You can type any model you want into the UI, even if it doesn't auto-complete--setting ollama/llama3 (or whatever was passed to ollama pull) should do the trick

Apr 23 '24 22:04 rbren

It kinda surprises me that the OpenDevin client doesn't just reassure the user that the client-server has been secured for as a part of the prep for further input e.g. just ping the show endpoint

We're mostly trying to stay LLM/provider agnostic, but we do have this issue: https://github.com/OpenDevin/OpenDevin/issues/923

Apr 23 '24 22:04 rbren

Seems like multiple issues were in this one issue but the original issue and author found a solution. Please feel free to open a new issue!

Jun 08 '24 16:06 mamoodi