AutoGPT Implement structured planning & evaluation

#6964

Discussion

Before we can start managing the workflow of the agent, we have to give it some more structure. The different ways to implement this can generally be subdivided into 3 groups:

💬 Your input is appreciated! If you think something important is missing from this comparison, please leave a comment below or bring it up on Discord!

Planning mechanism that is controlled by the agent
- a. Let the agent manage its plan and to-do list through commands
  This approach seems non-ideal for a couple of reasons:
  - Adds commands to the executive prompt, increasing its complexity
  - Requires a full step for any self-management action → adds significant overhead
- b. Add (an) output field(s) to the executive prompt through which the agent can manage its plan / to-do list
  More efficient than 1a, but unsure how well this will work:
  - Does not require extra steps just for self-management → limited additional overhead
  - This adds yet another function to the already complex prompt. Multi-purpose prompts are usually outperformed by a combination of multiple single-purpose prompts.
Planning mechanism that is part of the agent loop
- a. Extra step in the agent’s loop to evaluate progress and update the plan / to-do list.
  Good separation of concerns, but adding a sequential LLM-powered step to a loop will slow down the process.
  - Additional step = additional latency in every cycle
  - Separate unit with its own (focused) prompt → probably easier to tune compared to 1a/1b
- b. Parallel thread that evaluates the agent’s performance and intervenes / gives feedback when needed.
  A compromise, trading sequential (= simple to implement) evaluation for zero additional latency:
  - Separate unit with its own (focused) prompt → probably easier to tune compared to 1a/1b
  - Asynchronous/parallel evaluation
    - zero/low additional latency
    - eval for step i will arrive after executing step i+1 → “wasted” resources+time on cycle i+1 if evaluator decides to intervene
- c. Add output fields to the executive prompt through which the agent communicates its qualitative & quantitative assessment of the previous step.
  
  Quantitative output can be used directly to trigger an evaluation/intervention as in 2a, saving resources when this isn’t needed
- Possible addition to a. and b.: rewind the agent’s state to an earlier point in time and adjust its parameters if an approach proves unproductive.
💡 Note A component that evaluates the agent’s performance and provides it with feedback or other input is also in a good position to manage its directives.
Planning mechanism that controls the agent
- a. Compose plan consisting of subtasks & employ the agent to complete subtasks
  - i. Contextless (planning without knowledge of the agent’s abilities)
  - ii. Contextful (planning with knowledge of the agent’s abilities)
  Implementing planning outside the agent offers interesting possibilities, and may add value regardless of the agent’s own planning capabilities and mechanism. This could be a logical next step, towards multi-agent orchestration and/or tackling problems with arbitrary complexity.

Proposal

We propose a combination of solutions [2a] and [2c] based on the comparison above.

Why

[2a] provides for a relatively simple implementation
The additional latency of [2a] can be mitigated by using faster models. With narrowly scoped prompts, this shouldn’t be a problem.
The average additional latency of [2a] is reduced by combining it with [2c], only running the evaluation/planning step when necessary
[2c] doesn’t add much complexity to the existing executive prompt

To do

Work out:
1. Once we have a list of subtasks, how do we make the agent execute it? e.g. do we replace the agent’s task in the prompt by the current subtask?
2. Should subtasks be specified like actionables or like sub-objectives, or both? Does it make a difference?
Sketch amended agent loop diagram
Prototype planning+evaluation prompt(s) (in a Jupyter notebook)
Implement [2a]
Work out:
1. What are useful indicators for how a task is progressing?
  - "Is your approach to the problem working out?"
    
    → if not, evaluate & adjust plan
  - "Is this the last step needed to complete the [sub]task?"
    
    → if so, verify & move on to next subtask
Implement [2c]
1. Add thought outputs corresponding to the indicators from [5.i] to OneShotPromptStrategy (AKA the executive prompt)
2. Implement mechanism to only run planning/evaluation when needed

At the very least, even without getting fancy, we should be able to get the LLM to provide a list of tasks and move those into a queue to recursively call the LLM to provide sub-tasks and work those, pushing/popping as needed (possibly involving sub-stacks, which is kinda logical to do once you think about multiple agents: #3549)

This would need to take place with certain constraints:

sequentially (for now)
requirements
constraints
mutual dependencies

May 11 '23 10:05 Boostrix

The following blog article goes into detail about our lack of planning: https://lorenzopieri.com/autogpt_fix/

Subgoaling, the creation of intermediate tasks to achieve the user defined goals, is crucial to task-directed AIs such as AutoGPT. If even just a single key subgoal is missing, the whole plan fails. The current architecture is very primitive here: just ask the LLM to break down the goal into subtasks! There is something poetic about the simplicity of this approach, which is basically “to get the solution, just ask the LLM the solution”. If you are familiar with the exploitation-exploration trade off, this is a case of going all-in on exploitation: the best plan should already be in the train data or very easy to extrapolate from it. This is rarely the case in interesting tasks, so we want to introduce the ability to explore different possible subgoals, the ability to plan. More on this later, now let’s focus on the creation of subgoals. Breaking down a goal into subgoals is an active area of research, spanning approaches such as Creative Problem Solving, Hierarchical Reinforcement Learning and Goal Reasoning. For instance using Creative Problem Solving we can tackle situations where the initial concept space (the LLM knowledge base) is insufficient to lay down all the subgoals. We can use Boden’s three levels of creativity: 1. Exploration of the universal conceptual space. 2. Combining concepts within the initial conceptual space or 3. Applying functions or transformations to the initial conceptual space.

Jul 03 '23 09:07 Boostrix

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

Sep 06 '23 20:09 github-actions[bot]

🚀 AutoGPT Roadmap - Workflow efficacy 🧠 #6964

Discussion

Before we can start managing the workflow of the agent, we have to give it some more structure. The different ways to implement this can generally be subdivided into 3 groups:

💬 Your input is appreciated! If you think something important is missing from this comparison, please leave a comment below or bring it up on Discord!

Planning mechanism that is controlled by the agent

a. Let the agent manage its plan and to-do list through commands

This approach seems non-ideal for a couple of reasons:

Adds commands to the executive prompt, increasing its complexity

Requires a full step for any self-management action → adds significant overhead

b. Add (an) output field(s) to the executive prompt through which the agent can manage its plan / to-do list

More efficient than 1a, but unsure how well this will work:

Does not require extra steps just for self-management → limited additional overhead

This adds yet another function to the already complex prompt. Multi-purpose prompts are usually outperformed by a combination of multiple single-purpose prompts.

Planning mechanism that is part of the agent loop

a. Extra step in the agent’s loop to evaluate progress and update the plan / to-do list.

Good separation of concerns, but adding a sequential LLM-powered step to a loop will slow down the process.

Additional step = additional latency in every cycle

Separate unit with its own (focused) prompt → probably easier to tune compared to 1a/1b

b. Parallel thread that evaluates the agent’s performance and intervenes / gives feedback when needed.

A compromise, trading sequential (= simple to implement) evaluation for zero additional latency:

Separate unit with its own (focused) prompt → probably easier to tune compared to 1a/1b

Asynchronous/parallel evaluation

zero/low additional latency

eval for step i will arrive after executing step i+1 → “wasted” resources+time on cycle i+1 if evaluator decides to intervene

c. Add output fields to the executive prompt through which the agent communicates its qualitative & quantitative assessment of the previous step.

Quantitative output can be used directly to trigger an evaluation/intervention as in 2a, saving resources when this isn’t needed

Possible addition to a. and b.: rewind the agent’s state to an earlier point in time and adjust its parameters if an approach proves unproductive.

💡 Note A component that evaluates the agent’s performance and provides it with feedback or other input is also in a good position to manage its directives.

Planning mechanism that controls the agent

a. Compose plan consisting of subtasks & employ the agent to complete subtasks

i. Contextless (planning without knowledge of the agent’s abilities)

ii. Contextful (planning with knowledge of the agent’s abilities)

Implementing planning outside the agent offers interesting possibilities, and may add value regardless of the agent’s own planning capabilities and mechanism. This could be a logical next step, towards multi-agent orchestration and/or tackling problems with arbitrary complexity.

Proposal

We propose a combination of solutions [2a] and [2c] based on the comparison above.

Why

[2a] provides for a relatively simple implementation

The additional latency of [2a] can be mitigated by using faster models. With narrowly scoped prompts, this shouldn’t be a problem.

The average additional latency of [2a] is reduced by combining it with [2c], only running the evaluation/planning step when necessary

[2c] doesn’t add much complexity to the existing executive prompt

To do

Work out:

Once we have a list of subtasks, how do we make the agent execute it? e.g. do we replace the agent’s task in the prompt by the current subtask?

Should subtasks be specified like actionables or like sub-objectives, or both? Does it make a difference?

Sketch amended agent loop diagram

Prototype planning+evaluation prompt(s) (in a Jupyter notebook)

Implement [2a]

Work out:

What are useful indicators for how a task is progressing?

"Is your approach to the problem working out?" → if not, evaluate & adjust plan

"Is this the last step needed to complete the [sub]task?" → if so, verify & move on to next subtask

Implement [2c]

Add thought outputs corresponding to the indicators from [5.i] to OneShotPromptStrategy (AKA the executive prompt)

Implement mechanism to only run planning/evaluation when needed

Related

Earlier issues regarding improvements to tasking/planning:

Step back periodically to asses approach #305

Update Planner Interface #3790

Adaptability Challenge : Dynamic prompting #3937

Separation of execution and planning into different agents #3593

Not aware of past command+arguments; often enters "forever loops" and repeats the same actions #3644

Let's discuss : Multiple projects & Agent teaming #3392

DRAFT : Project Abstraction Layer #3549

This issue has been hijacked by a maintainer to replace+expand the solution proposal. Click here to see the original proposal by @Boostrix.

Jun 14 '24 01:06 nichechristie

nichebiche is love

Jun 14 '24 01:06 nichechristie

Implement structured planning & evaluation

Discussion

Proposal

Why

To do

Related

Motivation 🔦

Discussion

Proposal

Why

To do

Related

Discussion

Proposal

Why

To do

Related