OpenHands Local LLM llama2 extremely slow

Describe your question

I am running llama2 locally and the model is very fast when I run it with the command line ollama run llama2. When I run it with openDevin, the request and response are extremely slow and I get devin output every 30 minutes!

Also any updates on when we can access llama3?

Additional context

config.toml LLM_MODEL="ollama/llama2" LLM_API_KEY="ollama" LLM_EMBEDDING_MODEL="llama2" LLM_BASE_URL="http://127.0.0.1:11434" WORKSPACE_BASE="./workspace" DEBUG=1

Apr 20 '24 14:04 znahas

Did you already follow this guide? https://github.com/OpenDevin/OpenDevin/blob/d692a72bf3809df35d802041211fcd81d56b1dc6/docs/guides/LocalLLMs.md#local-llm-guide-with-ollama-server

It looks like you're using an older approach , and not correctly (embedding model would be "local" with old instructions)

Apr 20 '24 16:04 lowlyocean

I am running using make start-backend & make start-frontend. But OK now following your suggestion:

docker run
--add-host host.docker.internal=host-gateway
-e LLM_API_KEY="ollama"
-e LLM_BASE_URL="http://localhost:11434"
-e WORKSPACE_MOUNT_PATH=$WORKSPACE_DIR
-v $WORKSPACE_DIR:/opt/workspace_base
-v /var/run/docker.sock:/var/run/docker.sock
-p 3000:3000
ghcr.io/opendevin/opendevin:main

invalid argument "host.docker.internal=host-gateway" for "--add-host" flag: bad format for add-host: "host.docker.internal=host-gateway"

Apr 20 '24 17:04 znahas

I think your config is correct

LLM_MODEL="ollama/llama2"
LLM_API_KEY="ollama"
LLM_EMBEDDING_MODEL="llama2"
LLM_BASE_URL="http://127.0.0.1:11434/"
WORKSPACE_BASE="./workspace"

As for LLM Debugging, you should set export DEBUG=1 in the environment and restart the backend. OpenDevin will then log the prompts and responses in the logs/llm/CURRENT_DATE directory, allowing you to identify the causes.

If you use llama2 for chat and embedding, opendevin will use Ollama related library which should be the same speed if you entered the same prompt in cli.

Maybe you can use ollama's guide to check where's the issue.

Apr 21 '24 03:04 Umpire2018

@znahas unfortunately there's not much we can do about ollama being slow. It's very resource hungry! Especially for larger models.

llama3 should work fine with ollama, from what I can tell 😄

Apr 21 '24 19:04 rbren

I am running llama2 locally and the model is very fast when I run it with the command line ollama run llama2. When I run it with openDevin, the request and response are extremely slow

While you might get a quick answer by using llama2 directly, OpenDevin's prompt is usually very long, and thus if you are running Ollama on a less beefy machine (e.g. GitHub Actions), the response would be very slow.

Apr 21 '24 23:04 li-boxuan

Gonna close this one since there's not much we can do.

IIRC we do have an open issue to create an agent that specializes in small local models

Apr 23 '24 21:04 rbren