Local LLM llama2 extremely slow
Describe your question
I am running llama2 locally and the model is very fast when I run it with the command line ollama run llama2. When I run it with openDevin, the request and response are extremely slow and I get devin output every 30 minutes!
Also any updates on when we can access llama3?
Additional context
config.toml LLM_MODEL="ollama/llama2" LLM_API_KEY="ollama" LLM_EMBEDDING_MODEL="llama2" LLM_BASE_URL="http://127.0.0.1:11434" WORKSPACE_BASE="./workspace" DEBUG=1
Did you already follow this guide? https://github.com/OpenDevin/OpenDevin/blob/d692a72bf3809df35d802041211fcd81d56b1dc6/docs/guides/LocalLLMs.md#local-llm-guide-with-ollama-server
It looks like you're using an older approach , and not correctly (embedding model would be "local" with old instructions)
I am running using make start-backend & make start-frontend. But OK now following your suggestion:
docker run
--add-host host.docker.internal=host-gateway
-e LLM_API_KEY="ollama"
-e LLM_BASE_URL="http://localhost:11434"
-e WORKSPACE_MOUNT_PATH=$WORKSPACE_DIR
-v $WORKSPACE_DIR:/opt/workspace_base
-v /var/run/docker.sock:/var/run/docker.sock
-p 3000:3000
ghcr.io/opendevin/opendevin:main
invalid argument "host.docker.internal=host-gateway" for "--add-host" flag: bad format for add-host: "host.docker.internal=host-gateway"
I think your config is correct
LLM_MODEL="ollama/llama2"
LLM_API_KEY="ollama"
LLM_EMBEDDING_MODEL="llama2"
LLM_BASE_URL="http://127.0.0.1:11434/"
WORKSPACE_BASE="./workspace"
As for LLM Debugging, you should set
export DEBUG=1in the environment and restart the backend. OpenDevin will then log the prompts and responses in the logs/llm/CURRENT_DATE directory, allowing you to identify the causes.
If you use llama2 for chat and embedding, opendevin will use Ollama related library which should be the same speed if you entered the same prompt in cli.
Maybe you can use ollama's guide to check where's the issue.
@znahas unfortunately there's not much we can do about ollama being slow. It's very resource hungry! Especially for larger models.
llama3 should work fine with ollama, from what I can tell 😄
I am running llama2 locally and the model is very fast when I run it with the command line ollama run llama2. When I run it with openDevin, the request and response are extremely slow
While you might get a quick answer by using llama2 directly, OpenDevin's prompt is usually very long, and thus if you are running Ollama on a less beefy machine (e.g. GitHub Actions), the response would be very slow.
Gonna close this one since there's not much we can do.
IIRC we do have an open issue to create an agent that specializes in small local models