OpenHands Refactor config to dataclasses

Config has become big and is still growing. This PR refactors it in dataclasses and breaks out a couple of sub-sections, llm and agent, that are relatively self-contained. Should also clean up some older hacks. Bonus: mypy now figured out what's up here, and it didn't like things! (for good reason)

Goals: [x] refactor config into data classes [x] keep compability with env and current toml (Makefile updated, current toml seems less important to keep) [x] read new toml [x] serialize for the ui [x] adapt Makefile

Misc fixes: wrong types checks for config vars, integration tests for sandboxes not mounting in docker

Also, @li-boxuan suggestion implemented for CLI:

can define groups of LLM settings in config.toml, such as:

[gpt4]
model = "got-4-1106-preview"
api_key = "sk..."
embeddings_model = "openai"
...

a group can include any attribute of the LLMConfig dataclass, all optional
usage e.g.: poetry run python opendevin/core/main.py ... -l gpt4
only one of -l (--llm_config) or -m (--model) needs to be provided
if both are passed, then llm_config.model overrides model (...this doesn't sound right)

Notes:

~~tested without UI so far~~ looks like it still works with UI
it keeps compatibility with current vars
sandboxes vars seem tricky.
- this PR removes None as possible value for some paths, thus possibly (?) changes behavior.
- changes integration tests execution, they were running with None values for almost every config var, thus not mounting in docker and other effects.
- removes None for user id
- refactored all globals in sandboxes into properties @xingyaoww
tested for old and new vars
tested for types (e.g. retries vars need to be int), cleaned up some hacks
updatable singletons are a bit too much fun, feedback welcome if it is too risky/unorthodox
~~did not try yet local_box, exec_box~~
not tested: full app on docker.

The result is like:

[core]
workspace_base = "/home/myuser/workspace"
max_iterations = 100
cache_dir = "/tmp/cache"
sandbox_container_image = "ghcr.io/opendevin/sandbox:latest"
run_as_devin = true
max_iterations = 100
sandbox_type = "ssh"
use_host_network = false
ssh_hostname = "localhost"
sandbox_timeout = 120
...

[llm]
api_key = "sk-proj-somethi..."
model = "gpt-3.5-turbo-1106"
api_version = "2023-01"
embedding_model = "openai"
...

[agent]
name = "MonologueAgent"
memory_enabled = true
memory_max_threads = 2

May 03 '24 13:05 enyst

I have a slightly different idea, inspired by my pain switching between GPT 3.5, 4, and Ollama:

[default]
workspace_base = "/home/myuser/workspace"
max_iterations = 100
cache_dir = "/tmp/cache"
sandbox_container_image = "ghcr.io/opendevin/sandbox:latest"
run_as_devin = true
max_iterations = 100
sandbox_type = "ssh"
use_host_network = false
ssh_hostname = "localhost"
sandbox_timeout = 120
...
[llama3]
model = "ollama/llama3"
embedding_model = "local"
...
[adhoc-experiment]
api_key = "sk-proj-somethi..."
model = "gpt-3.5-turbo-1106"
api_version = "2023-01"
embedding_model = "openai"

May 04 '24 04:05 li-boxuan

@li-boxuan Is this pain with the UI or without the UI?

I'm doing some switching these days, and with the UI I find it confusing that I'm not doing it unless I have to. Without the UI, it's much better, I have a bunch of commented and uncommented vars in .env. (The UI is changing to add more, if they're in a logical place it will make it better IMHO)

These are buckets of predefined LLMConfig. If with UI, I'm guessing it's actually the UI that would allow to pick. If it's without UI, we'll add a bit of complexity and maybe in a follow-up, but it seems just fine. A new command line parameter to control which bucket gets applied?

May 04 '24 09:05 enyst

Is this pain with the UI or without the UI?

Both I guess. Not really pain; just some inconvenience. Imagine how smooth it would be, if we can use a cmdline arg to control which "group config" to choose. As you said, nowdays you need to comment & uncomment a bunch of configs. From GUI standpoint, it would be awesome if one can choose a predefined group of configs.

Anyways, just a random thought of mine and orthogonal (maybe?) to this PR.

May 04 '24 16:05 li-boxuan

Codecov Report

Attention: Patch coverage is 82.25256% with 52 lines in your changes are missing coverage. Please review.

:exclamation: No coverage uploaded for pull request base (main@242c4a0). Click here to learn what that means.

Files	Patch %	Lines
opendevin/core/config.py	82.82%	28 Missing :warning:
opendevin/server/agent/agent.py	0.00%	7 Missing :warning:
opendevin/server/listen.py	0.00%	7 Missing :warning:
opendevin/core/main.py	44.44%	5 Missing :warning:
opendevin/runtime/docker/exec_box.py	88.23%	2 Missing :warning:
opendevin/runtime/docker/ssh_box.py	92.59%	2 Missing :warning:
opendevin/controller/action_manager.py	75.00%	1 Missing :warning:

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1552   +/-   ##
=======================================
  Coverage        ?   62.86%           
=======================================
  Files           ?       94           
  Lines           ?     3867           
  Branches        ?        0           
=======================================
  Hits            ?     2431           
  Misses          ?     1436           
  Partials        ?        0

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

May 05 '24 00:05 codecov-commenter

@li-boxuan The more I thought about your suggestion, the more it wanted to be done... It makes things so much easier! So it's implemented. Now I can save debug configurations in vscode with full predefined llm settings and just pick one. 😆

May 07 '24 01:05 enyst

@rbren @li-boxuan

Added: refactored globals in use in sandboxes into properties. Shouldn't change (desirable) behavior...Unless it might?
Integration tests for sandboxes were executing with almost all config vars set to None, thus missing things like the directory for docker so it was not mounting anything. Could that possibly have been intended or known? It was a surprise to me and it's fixed in this PR, since types are enforced enough to not allow None for those paths and the attributes are applied.

Please advise if I'm missing something.

May 08 '24 22:05 enyst

thus missing things like the directory for docker so it was not mounting anything

I am not sure if I understand but that doesn't sound right. run-integration-tests.yml defines the following variables:

SANDBOX_TYPE: ${{ matrix.sandbox }}
AGENT: ${{ matrix.agent }}
MAX_ITERATIONS: 10
LLM_EMBEDDING_MODEL: ${{ matrix.embedding-model }}
WORKSPACE_BASE="$GITHUB_WORKSPACE/workspace"
WORKSPACE_MOUNT_PATH="$GITHUB_WORKSPACE/workspace"

It was a surprise to me and it's fixed in this PR

Do you mind pointing me to the fix? Which commit/file/line?

May 09 '24 01:05 li-boxuan

Integration tests for sandboxes were executing with almost all config vars set to None

My guess is this was true when running locally, but they were set in the GitHub actions (and in regenerate.sh)

May 09 '24 13:05 rbren

I mean these tests: https://github.com/OpenDevin/OpenDevin/blob/a60a6a40d6ea21994b0501c7afb672385a94f14c/tests/unit/test_sandbox.py#L45 My bad, my brain didn't quite map running docker with user commands as 'unit' so I called them integration tests. 😅

This is None at runtime: https://github.com/OpenDevin/OpenDevin/blob/a60a6a40d6ea21994b0501c7afb672385a94f14c/opendevin/runtime/docker/ssh_box.py#L415

The reason why it isn't anymore on this branch is that all the paths default to something, the type for all paths is string, without | None. We don't remove those strings later just for tests either. Should I make these tests work on None paths?

May 09 '24 14:05 enyst

Looks like this is good to go! Just need to fix the conflicts

May 09 '24 15:05 rbren

I think having defaults for the paths makes sense. And yeah--integration is probably the right word for those 😄

May 09 '24 15:05 rbren