OpenHands Use messages to drive tasks

This is adapted from https://github.com/OpenDevin/OpenDevin/pull/1646. It removes the change to CodeAct's prompt, but sets us up to add the <finish> command when ready

The main change here is that instead of having a core main_goal in the State object, we let the agent parse through user messages. Most agents are now defaulting to the most recent user message as the goal.

When the agent finishes, the user can now send a new message (e.g. asking for a tweak) which will preserve session history

May 10 '24 13:05 rbren

One comment: it might be a bit rough to summarize the entire user intent into "latest user message". The latest user message may be a clarification message or something that isn't actually the full intent. I don't think this needs to be fixed in this PR, but maybe we could open a tracking issue to discuss more about how to summarize the user's intent when they have provided multiple messages?

Yeah this is totally true. CodeAct handles it just fine--the microagents not so much.

Will definitely take another pass at this in a future PR

May 10 '24 20:05 rbren

Re: inheritance, I added some more notes to the docstring on Plan.

Ultimately, we have a tree structure representing the agent's plan, e.g.

Do x
- Do x'
- Do x''
  - Do x''1
  - Do x''2
Do y
- Do y'
  - Do y''
Do z

What's odd about this data structure is that the top-level is a list, not a single node. So to hold onto it, we need a sort of invisible root node. That's where Plan comes in.

Plan needs to be the same type as every other node, so that parent references work nicely. But it's also a special node that doesn't contain any information itself, hence the subclassing.

I'm open to other suggestions here, but this is the best I've come up with

May 11 '24 13:05 rbren

Thanks @rbren , I'm still a bit confused though. Why couldn't we just use Task for the top-level node too?

May 11 '24 15:05 neubig

We need to e.g. override the __init__ so that it doesn't try and set an id etc

May 11 '24 15:05 rbren

What about something like this?

class Task:
    def __init__(
        self,
        parent: 'Task' | None,
        goal: str,
        state: str = OPEN_STATE,
        subtasks: list = [],
    ):
        """Initializes a new instance of the Task class.

        Args:
            parent: The parent task, or None if it is the root task.
            goal: The goal of the task.
            state: The initial state of the task.
            subtasks: A list of subtasks associated with this task.
        """
        if parent is None:
            self.id = ''
        elif parent.id:
            self.id = parent.id + '.' + str(len(parent.subtasks))
        else:
            self.id = str(len(parent.subtasks))
        self.parent = parent
        self.goal = goal
        self.subtasks = []
        for subtask in subtasks or []:
            if isinstance(subtask, Task):
                self.subtasks.append(subtask)
            else:
                goal = subtask.get('goal')
                state = subtask.get('state')
                subtasks = subtask.get('subtasks')
                self.subtasks.append(Task(self, goal, state, subtasks))

        self.state = OPEN_STATE

then for Plan we call Task(None, '', OPEN_STATE, subtasks)?

Note that the only thing I changed is the first if parent is None part.

May 11 '24 15:05 neubig

That works!

The bigger issue I think are the helper methods that are defined on Plan: get_task_by_id, add_subtask, and set_subtask_state. Calling these on a non-root Task would cause problems.

Two things I could do:

Make these normal functions, instead of class methods, which take in a task param
Put an assert at the top of the function making sure the task it's called on is a root task

The downside is we lose type safety--the code can call those functions on a non-root task, and we'd just get a runtime error.

WDYT?

May 12 '24 15:05 rbren

Thanks @rbren , I think the "assert" thing should probably be fine for now?

And I feel that (eventually, not in this PR) we might even be able to just set this up so there's not really any distinction between a top-level plan and a task at all. After all, tasks themselves can also have subtasks. I don't think there's anything really all that special about the one at the top of the hierarchy versus the ones in the middle of the hierarchy.

May 12 '24 22:05 neubig

I wonder if you could have a base class, e.g. TaskNode, that a Task inherits from. Then the parent node is a TaskNode, and most of the planning items are of class Task.

edit: I think this is what you are already doing, but the terminology Plan makes it seem completely different than a Task, when it is just a more abstract version of the same idea.

May 13 '24 00:05 reteps

we might even be able to just set this up so there's not really any distinction between a top-level plan and a task at all. After all, tasks themselves can also have subtasks.

agree with @neubig on this part

Two things I could do: Make these normal functions, instead of class methods, which take in a task param

Is it possible to maintain some sort of registry that hosts all these tasks - similar to what we are doing with Agent classes now, but not name them as Plan? Maybe like a TaskManager, and the sole purpose of that class is to manage the life-cycle of a Task.

May 13 '24 12:05 xingyaoww

🤔 what if we just rename Plan to RootTask?

May 13 '24 14:05 rbren

I'd be OK with this, especially if it unblocks your work. But in principle:

I'm not super-convinced that Plan is different enough from Task that it's worth separating them in the first place
Plans are often hierarchical, such as in Step, and it's not clear that we should only allow Plan-style Tasks in the root node.

May 13 '24 14:05 neubig

^Same here. I'd hope Task themselves do not distinguish from each other (e.g., a tree node), but I'm fine with having a separate class managing all existing tasks (e.g., renaming Plan to TaskManager).

May 13 '24 14:05 xingyaoww

Codecov Report

Attention: Patch coverage is 61.64384% with 28 lines in your changes are missing coverage. Please review.

Project coverage is 62.60%. Comparing base (755a407) to head (c9840a6). Report is 1 commits behind head on main.

:exclamation: Current head c9840a6 differs from pull request most recent head 757f546. Consider uploading reports for the commit 757f546 to get more accurate results

Files	Patch %	Lines
opendevin/controller/state/task.py	42.10%	11 Missing :warning:
agenthub/planner_agent/prompt.py	33.33%	4 Missing :warning:
opendevin/server/listen.py	0.00%	3 Missing :warning:
agenthub/codeact_agent/codeact_agent.py	75.00%	2 Missing :warning:
agenthub/delegator_agent/agent.py	0.00%	2 Missing :warning:
agenthub/planner_agent/agent.py	0.00%	2 Missing :warning:
opendevin/controller/agent_controller.py	75.00%	2 Missing :warning:
agenthub/SWE_agent/prompts.py	85.71%	1 Missing :warning:
opendevin/core/main.py	83.33%	1 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1688      +/-   ##
==========================================
+ Coverage   62.53%   62.60%   +0.07%     
==========================================
  Files          99       99              
  Lines        4057     4049       -8     
==========================================
- Hits         2537     2535       -2     
+ Misses       1520     1514       -6

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

May 13 '24 14:05 codecov[bot]