OpenHands [Feature] Refactor runtime executor for better extensibility/configurability

Overview

This PR refactors the action_execution_server and ActionExecutor, to facilitate extensibility and customization of the executor within the runtime. In particular:

Definition of the executor class is moved out of the action_executor_server.py script, leaving it to focus on the API server implementation.
The ActionExecutor class has been refactored into a hierarchy:
- RuntimeExecutor in runtime/executor/base.py handles the setup/teardown of the common resources: bash session, browser, plugins, etc.
- BaseActionExecutor in runtime/executor/action_executor.py adds the dynamic run_action dispatcher.
- ActionExecutor in the same file, adds the detailed implementation for all of the standard runners (run_ipython, read, browse, etc).

Importantly, the executor class to be used as been added to the AppConfig and can be used to specify an arbitrary class (even from an external module) to provide the executor implementation, as long as it implements the base interface. For example, you can set runtime_executor = "my_package.some_module:MyCustomExecutor".

Note that I have not changed any code/functionality of the existing ActionExecutor, just distributed its code around into the three-class hierarchy. There are minor changes elsewhere in the code to accommodate the new configuration setting.

Rationale

This change makes it possible to easily extend/override/customize the builtin executor implementation, without having to operate on core classes in the OpenHands repo (or even make change to the repo at all).

Caveats

In the exiting ActionExecutor, there is a block of code labeled "TODO" in the ainit method, which evidently is meant to be removed pending an unrelated refactoring operation. This temporary block of code depends on the run_ipython method, which is not implemented in the base class. Rather that introducing additional abstractions to accommodate this temporary code, I have elected to leave it in place in the base implementation. However this means executor classes will fail to initialize unless they provide the run_ipython method (as the default ActionExecutor does).

A more general fix would be to add an overridable hook method that can be called in the ainit function, which I'm happy to add if deemed necessary.

In the the base RuntimeExecutor in runtime/executor/base.py, there

End-user friendly description of the problem this fixes or functionality that this introduces

[x] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Runtime executor can now be subclassed to provide custom or extended functionality. The executor class used in the runtime can be configured with:

[core]
runtime_executor = "your_package.your_module:YourCustomExecutorClass"

Give a summary of what the PR does, explaining any non-trivial design decisions

Link of any specific issues this addresses

Jan 15 '25 20:01 cshimmin

Hmm I think we've gone one step too far down the abstraction rabbit hole here

The Runtime class was meant to be something like your ActionExecutor here. Then we created a version of the Runtime that starts a remote ActionExecutionServer, and interfaces w/ it via HTTP. And now all the existing Runtime implementations do this.

What's the end goal here? What exactly are you trying to customize?

Jan 17 '25 20:01 rbren

We are building a custom Agent that has its own Actions. So we have our own classes in agenthub, and in runtime/impl. Those are fairly extensible, in that they don't really care what kind of Action they are passing around. But the ActionExecutor is basically hard-coded to handle only the Actions in the base implementation. We can extend the ActionExecutor, and add run_<custom-action>() methods, but there's no way to tell OH to use that derived class; it's hard coded to use only ActionExecutor in the action_execution_server.py. Which means for now, we are hacking directly on this file, which sees quite a lot of updates each release.

Obviously we could implement a custom runtime that doesn't use action_execution_server, but there's a lot of useful infrastructure (i.e. the fastapi service, which is entirely defined in the if __name__ == "__main__" block, so we can't reuse it by importing), that works totally fine for our purposes. So, I have added a command line argument to the action_execution_server that lets you tell what class you'd like to use. I can remove the changes to ActionExecutor, and leave the class definition in this executable script, if you like.

The main thing we really need is the ability to tell the execution server what class to instantiate as the ActionExecutor, so I'm happy to pare it back to just that if you feel strongly about it. The rest of the refactoring was just about separating the runtime setup of the ActionExecutor from the per-action handlers. We either redefine, or don't need, those run_*() handlers. But they don't really get in our way so it's not a big deal.

FWIW I was also going to propose a PR to make the Agent and Runtime classes configurable by specifying an import path. Currently the runtime instantiation is handled with a string comparison lookup in runtime/__init__.py, which means at a minimum that file has to be hacked to support customizations. The situation with Agent is a bit better, but you still have to hack at some entry point (e.g. agenthub/__init__.py, or an alternative to server/listen.py) to ensure you call register() on your custom Agent. In both cases, it would be nice to be able to simply add (or append) package.module:ClassName in the config file and ensure the requested implementation gets instantiated or registered at the appropriate time.

Jan 18 '25 00:01 cshimmin

@cshimmin Just a quick note on one aspect: regarding the agent, I do think we should fix this. You're right, the code is almost where we need it, it's just hard-codes names instead of loading whatever it is in agenthub, or alternatively your suggestion. If you want to submit a PR on that, it will be welcome!

I'll defer to @rbren on the runtime design, but thank you for raising these points, they're worth thinking about, and sorry for the merge conflicts!

Jan 18 '25 02:01 enyst

@enyst sure thing, I'll send a PR for that tomorrow likely.

Jan 18 '25 02:01 cshimmin

@rbren , where did we land on this? Would you like me to redo this PR to include only the config option for specifying the ActionExecutor class? I can leave the existing ActionExecutor class implementation as-is and of course keep it as the default for backwards compatibility.

Jan 21 '25 16:01 cshimmin

CC @xingyaoww

I'm a bit hesitant to expand our API here to allow custom actions. Can you say more about what sorts of actions you're adding here? It's not quite as simple as "add new methods to the runtime class"--we'd need to add new Event subclasses, manage migrations for those events, etc...

I'm not sure the maintenance burden is worth it here. But if we can understand your use case better I'd be willing to consider it, especially if other users are likely to get use out of it

Jan 21 '25 18:01 rbren

I'm a bit hesitant to expand our API here to allow custom actions.

Yeah, echoing Robert on this one -- We intentionally keep the list of action small enough for ease of maintenance.

I'd suggest considering using existing Action, if you want more functionality. E.g., if you want to build a download file for agent, consider implement it using CmdRunAction("wget {THINGS_THE_AGENT_WANT_TO_DOWNLOAD}").

Here's an actual example of how we use it now -- we implement FileEditAction by pip install https://github.com/All-Hands-AI/openhands-aci and interpret Tool call to IPythonRunCellAction.

https://github.com/All-Hands-AI/OpenHands/blob/7df7f43e3cbe5eebcae1c06293689d81ebfd7104/openhands/agenthub/codeact_agent/function_calling.py#L480-L486

This change makes it possible to easily extend/override/customize the builtin executor implementation, without having to operate on core classes in the OpenHands repo (or even make change to the repo at all).

Is the primary motivation for extending builtin executor to expand the list of actions? If so, would the above way of executing custom actions be able to solve the existing problems?

Regarding custom ActionExecutor.

I'm a little hesitant to add this additional layer of abstract that makes the codebase more complex / less maintainable (it's already so complex 😢 ) without significant benefits.

Creating the ability to use "custom action executor" could also cause a lot of potential issues, especially when the main OpenHands codebase has some update that wasn't handled by the "custom action executor," and it will be a lot harder to figure out / investigate.

We'd hope to put most core components (runtime/agent implementation) we have in one place (i.e., this OSS repo) so it is easy to thoroughly test them and maintain them.

For extending the default actions, i'd suggest building third-party tools (e.g., pip-installable packages) and instruct agent to directly use them (via Python/Bash) per the philosophy of CodeAct :)

Jan 21 '25 18:01 xingyaoww

Thank you @rbren @xingyaoww for your detailed responses, and apologies for the slow reply. I totally understand your objections in terms of the surface area. We have been using OH in headless mode (actually, interacting with it via slack bot). And we're not specifically using CodeActAgent, but rather a custom agent that handles some custom in-house actions. That is, we've been using OH mostly as a framework for agent/container orchestration.

Initially it seemed like the codebase is naturally structured for this kind of generic use. We've just subclassed Agent, Action, Observation, and Runtime classes and largely have no problem (except for configuring it to actually use our classes, as mentioned in the original issue description). Looking at the codebase more closely, I see that even though the architecture is generic, the event/action schema is specialized towards CodeActAgent in particular. Even though the system as a whole works with generics, I guess the schema-ification is to support the frontend presentation?

As for accomplishing everything with bash/ipython: well, okay, point taken. But in some sense having access to bash is "compute complete", so of course we can do anything that way. But since we have a specialized agent that does certain specific things, it makes sense to just have the executor handling them. One example is anything that maintains state: the executor has the same lifetime as the runtime, so it's very convenient to just have it handle state as well. I get that this is totally outside the use case of CodeActAgent, but the point of this patch is that kind of functionality can be tacked on without having to also rewrite all the HTTP client/server stuff that's already very nicely handled in the action_execution_server.

Anyhow, the patch I've submitted here would certainly make it easier to customize what our agent can do inside the runtime, without having to reinvent certain wheels. It just factorizes the HTTP client/server and some of the base Executor logic with (IMO) low abstraction overhead. But if you really think it would be a maintenance liability, we respect your stance on that, and can work around it.

As for the custom Action/Observation/Runtime stuff we're doing, we accept that it is outside the realm of what should be supported by OH upstream, but since the overall architecture is quite amenable to generic use it doesn't seem to be a problem for us going forward. Unless the plan is for OH to index even further and specialize on CodeActAgent which, IMO, would be an unfortunate loss of potential for a very well designed framework! And who knows what the "next CodeAct paper" will teach us.

In any case, it's been a pleasure to use OH and thank you for the great work! 👐

Feb 10 '25 00:02 cshimmin

I asked OpenHands to fix the conflict, @cshimmin feel free to cherry-pick this commit if you think that looks correct: https://github.com/All-Hands-AI/OpenHands/commit/e29a4147f28242109ba2395e8242d6f7007a81ae

Feb 24 '25 16:02 xingyaoww

Was pointed at this since in #6955 I'm trying to make the other side of Runtimes configurable. My impression FWIW:

Generally, this seems like a positive change. A file with too much responsibility has been broken down and made more modular. There will probably be plenty more to say on how we can enable custom agents, what interface is right, but this is a useful step.

My main concern is how we can be confident this refactor hasn't broken anything. I think a fair number of tests indirectly hit this code, I'm not sure how well the edge cases may be covered, and how much real change was done here vs just moving things around.

I left a nit about the class resolution since we have a util for it.

There are also some questions I want to raise about how we handle inheritance and these configurable classes in the codebase as a whole, but I'll do that separately because this PR is reasonably following the other code's patterns.

Feb 26 '25 06:02 raymyers

Hey all, glad to hear that this approach can be viable for OH! Sorry just catching up on the activity last week, I'll work on bringing this PR up to date by fixing the conflicts.

Mar 05 '25 20:03 cshimmin

This PR is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Apr 05 '25 02:04 github-actions[bot]

This PR was closed because it has been stalled for over 30 days with no activity.

Apr 12 '25 02:04 github-actions[bot]