OpenHands [MonologueAgent] Step 0 takes longer than other steps

Describe the bug

When running a new task via the UI, there's generally a very long pause before the first step is completed.

I suspect we're doing something like starting the sandbox container, or initializing something. We could do this much sooner (e.g. on the initialize event, or on first connection, or on startup) to improve latency.

Setup and configuration

Current version:

❯ git log -n1
commit 73fb4843a39e65cc6298f8c47efeb3ed0807adf2 (HEAD -> main, origin/main, origin/HEAD)
Author: Exlo <[email protected]>
Date:   Mon Apr 8 16:36:04 2024 +0200

My config.toml and environment vars (be sure to redact API keys):

LLM_MODEL="gpt-3.5-turbo-1106"
LLM_API_KEY="sk-..."
LLM_EMBEDDING_MODEL=""
WORKSPACE_DIR="./workspace"

My model and agent (you can see these settings in the UI):

Model: gpt-3.5-turbo-1106
Agent: MonologueAgent

Commands I ran to install and run OpenDevin:

make build
npm run start -- --port 3001 --host 0.0.0.0
poetry run uvicorn opendevin.server.listen:app --port 3000 --host 0.0.0.0

Steps to Reproduce:

Start with default settings
Run a task
Time step 0 vs subsequent steps

Apr 08 '24 14:04 rbren

Did some more investigation here.

The issue is that we call _initialize() on the MonologueAgent in the first step. This adds many new documents (the initial monologue) to the chromadb index. Each one takes ~500ms.

We could hypothetically call _initialize() earlier, but we need the task text first.

Best solution here would probably be to do a multi-insert

Apr 10 '24 21:04 rbren

More info: I tried removing the chromadb dependency, and using from_documents to initialize the memory. That seems to drop init time from ~38s to ~29s. Better, but still pretty terrible ☹️

It's possible we'll just need to cope with the slowness of building an index...

Apr 10 '24 21:04 rbren

Is chromadb only used with the Monologue agent?

Apr 10 '24 21:04 foragerr

~~The initial monologue is more or less static, I wonder if there's a way to snapshot chromadb and use that at _initialize()~~ Not true.

Apr 11 '24 02:04 foragerr

@foragerr yes--only Monologue uses chromadb

TBH we could probably rip it out without sacrificing much quality. But we'll want a vectordb sooner rather than later...

Apr 12 '24 11:04 rbren

LLM_EMBEDDING_MODEL=""

I did some tests tonight, and it was inserts with openai embeddings that got around 20 sec, but not with local. With local embeddings it was 10x less, e.g. around 2-3 seconds at most. 😕 The difference is so significant, are you sure these tests were with local?

This was the total time of inserts in _initialize. Each insert includes getting embeddings for that sentence. I think that explains the difference.

The round-trip is probably only a part of that. I suppose it's also that text-embedding-ada-002 model has 1,536 dimensions vs 383 or so for BAAI/bge-small. I don't know yet how much each matters.

Apart from multi-insert, which I assume would still mean getting embeddings per sentence... I wonder if we can get embeddings once for some bigger chunk of the monologue, like several related sentences, and save those?

Apr 16 '24 03:04 enyst

Using BAAI/bge-small could be a serious improvement!

Apr 17 '24 16:04 rbren