Use messages to drive tasks
This is adapted from https://github.com/OpenDevin/OpenDevin/pull/1646. It removes the change to CodeAct's prompt, but sets us up to add the <finish> command when ready
The main change here is that instead of having a core main_goal in the State object, we let the agent parse through user messages. Most agents are now defaulting to the most recent user message as the goal.
When the agent finishes, the user can now send a new message (e.g. asking for a tweak) which will preserve session history
One comment: it might be a bit rough to summarize the entire user intent into "latest user message". The latest user message may be a clarification message or something that isn't actually the full intent. I don't think this needs to be fixed in this PR, but maybe we could open a tracking issue to discuss more about how to summarize the user's intent when they have provided multiple messages?
Yeah this is totally true. CodeAct handles it just fine--the microagents not so much.
Will definitely take another pass at this in a future PR
Re: inheritance, I added some more notes to the docstring on Plan.
Ultimately, we have a tree structure representing the agent's plan, e.g.
- Do x
- Do x'
- Do x''
- Do x''1
- Do x''2
- Do y
- Do y'
- Do y''
- Do y'
- Do z
What's odd about this data structure is that the top-level is a list, not a single node. So to hold onto it, we need a sort of invisible root node. That's where Plan comes in.
Plan needs to be the same type as every other node, so that parent references work nicely. But it's also a special node that doesn't contain any information itself, hence the subclassing.
I'm open to other suggestions here, but this is the best I've come up with
Thanks @rbren , I'm still a bit confused though. Why couldn't we just use Task for the top-level node too?
We need to e.g. override the __init__ so that it doesn't try and set an id etc
What about something like this?
class Task:
def __init__(
self,
parent: 'Task' | None,
goal: str,
state: str = OPEN_STATE,
subtasks: list = [],
):
"""Initializes a new instance of the Task class.
Args:
parent: The parent task, or None if it is the root task.
goal: The goal of the task.
state: The initial state of the task.
subtasks: A list of subtasks associated with this task.
"""
if parent is None:
self.id = ''
elif parent.id:
self.id = parent.id + '.' + str(len(parent.subtasks))
else:
self.id = str(len(parent.subtasks))
self.parent = parent
self.goal = goal
self.subtasks = []
for subtask in subtasks or []:
if isinstance(subtask, Task):
self.subtasks.append(subtask)
else:
goal = subtask.get('goal')
state = subtask.get('state')
subtasks = subtask.get('subtasks')
self.subtasks.append(Task(self, goal, state, subtasks))
self.state = OPEN_STATE
then for Plan we call Task(None, '', OPEN_STATE, subtasks)?
Note that the only thing I changed is the first if parent is None part.
That works!
The bigger issue I think are the helper methods that are defined on Plan: get_task_by_id, add_subtask, and set_subtask_state. Calling these on a non-root Task would cause problems.
Two things I could do:
- Make these normal functions, instead of class methods, which take in a task param
- Put an
assertat the top of the function making sure the task it's called on is a root task
The downside is we lose type safety--the code can call those functions on a non-root task, and we'd just get a runtime error.
WDYT?
Thanks @rbren , I think the "assert" thing should probably be fine for now?
And I feel that (eventually, not in this PR) we might even be able to just set this up so there's not really any distinction between a top-level plan and a task at all. After all, tasks themselves can also have subtasks. I don't think there's anything really all that special about the one at the top of the hierarchy versus the ones in the middle of the hierarchy.
I wonder if you could have a base class, e.g. TaskNode, that a Task inherits from. Then the parent node is a TaskNode, and most of the planning items are of class Task.
edit: I think this is what you are already doing, but the terminology Plan makes it seem completely different than a Task, when it is just a more abstract version of the same idea.
we might even be able to just set this up so there's not really any distinction between a top-level plan and a task at all. After all, tasks themselves can also have subtasks.
agree with @neubig on this part
Two things I could do: Make these normal functions, instead of class methods, which take in a task param
Is it possible to maintain some sort of registry that hosts all these tasks - similar to what we are doing with Agent classes now, but not name them as Plan? Maybe like a TaskManager, and the sole purpose of that class is to manage the life-cycle of a Task.
🤔 what if we just rename Plan to RootTask?
I'd be OK with this, especially if it unblocks your work. But in principle:
- I'm not super-convinced that
Planis different enough fromTaskthat it's worth separating them in the first place - Plans are often hierarchical, such as in Step, and it's not clear that we should only allow
Plan-styleTasks in the root node.
^Same here. I'd hope Task themselves do not distinguish from each other (e.g., a tree node), but I'm fine with having a separate class managing all existing tasks (e.g., renaming Plan to TaskManager).
Codecov Report
Attention: Patch coverage is 61.64384% with 28 lines in your changes are missing coverage. Please review.
Project coverage is 62.60%. Comparing base (
755a407) to head (c9840a6). Report is 1 commits behind head on main.
:exclamation: Current head c9840a6 differs from pull request most recent head 757f546. Consider uploading reports for the commit 757f546 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## main #1688 +/- ##
==========================================
+ Coverage 62.53% 62.60% +0.07%
==========================================
Files 99 99
Lines 4057 4049 -8
==========================================
- Hits 2537 2535 -2
+ Misses 1520 1514 -6
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.