[MonologueAgent] Step 0 takes longer than other steps
Describe the bug
When running a new task via the UI, there's generally a very long pause before the first step is completed.
I suspect we're doing something like starting the sandbox container, or initializing something. We could do this much sooner (e.g. on the initialize event, or on first connection, or on startup) to improve latency.
Setup and configuration
Current version:
⯠git log -n1
commit 73fb4843a39e65cc6298f8c47efeb3ed0807adf2 (HEAD -> main, origin/main, origin/HEAD)
Author: Exlo <[email protected]>
Date: Mon Apr 8 16:36:04 2024 +0200
My config.toml and environment vars (be sure to redact API keys):
LLM_MODEL="gpt-3.5-turbo-1106"
LLM_API_KEY="sk-..."
LLM_EMBEDDING_MODEL=""
WORKSPACE_DIR="./workspace"
My model and agent (you can see these settings in the UI):
- Model: gpt-3.5-turbo-1106
- Agent: MonologueAgent
Commands I ran to install and run OpenDevin:
make build
npm run start -- --port 3001 --host 0.0.0.0
poetry run uvicorn opendevin.server.listen:app --port 3000 --host 0.0.0.0
Steps to Reproduce:
- Start with default settings
- Run a task
- Time step 0 vs subsequent steps
Did some more investigation here.
The issue is that we call _initialize() on the MonologueAgent in the first step. This adds many new documents (the initial monologue) to the chromadb index. Each one takes ~500ms.
We could hypothetically call _initialize() earlier, but we need the task text first.
Best solution here would probably be to do a multi-insert
More info: I tried removing the chromadb dependency, and using from_documents to initialize the memory. That seems to drop init time from ~38s to ~29s. Better, but still pretty terrible âšī¸
It's possible we'll just need to cope with the slowness of building an index...
Is chromadb only used with the Monologue agent?
~~The initial monologue is more or less static, I wonder if there's a way to snapshot chromadb and use that at _initialize()~~
Not true.
@foragerr yes--only Monologue uses chromadb
TBH we could probably rip it out without sacrificing much quality. But we'll want a vectordb sooner rather than later...
LLM_EMBEDDING_MODEL=""
I did some tests tonight, and it was inserts with openai embeddings that got around 20 sec, but not with local. With local embeddings it was 10x less, e.g. around 2-3 seconds at most. đ The difference is so significant, are you sure these tests were with local?
This was the total time of inserts in _initialize. Each insert includes getting embeddings for that sentence. I think that explains the difference.
The round-trip is probably only a part of that. I suppose it's also that text-embedding-ada-002 model has 1,536 dimensions vs 383 or so for BAAI/bge-small. I don't know yet how much each matters.
Apart from multi-insert, which I assume would still mean getting embeddings per sentence... I wonder if we can get embeddings once for some bigger chunk of the monologue, like several related sentences, and save those?
Using BAAI/bge-small could be a serious improvement!