Implement structured planning & evaluation
- #6964
Discussion
Before we can start managing the workflow of the agent, we have to give it some more structure. The different ways to implement this can generally be subdivided into 3 groups:
đź’¬ Your input is appreciated! If you think something important is missing from this comparison, please leave a comment below or bring it up on Discord!
-
Planning mechanism that is controlled by the agent
-
a. Let the agent manage its plan and to-do list through commands
This approach seems non-ideal for a couple of reasons:
- Adds commands to the executive prompt, increasing its complexity
- Requires a full step for any self-management action → adds significant overhead
-
b. Add (an) output field(s) to the executive prompt through which the agent can manage its plan / to-do list
More efficient than 1a, but unsure how well this will work:
- Does not require extra steps just for self-management → limited additional overhead
- This adds yet another function to the already complex prompt. Multi-purpose prompts are usually outperformed by a combination of multiple single-purpose prompts.
-
-
Planning mechanism that is part of the agent loop
-
a. Extra step in the agent’s loop to evaluate progress and update the plan / to-do list.
Good separation of concerns, but adding a sequential LLM-powered step to a loop will slow down the process.
- Additional step = additional latency in every cycle
- Separate unit with its own (focused) prompt → probably easier to tune compared to 1a/1b
-
b. Parallel thread that evaluates the agent’s performance and intervenes / gives feedback when needed.
A compromise, trading sequential (= simple to implement) evaluation for zero additional latency:
- Separate unit with its own (focused) prompt → probably easier to tune compared to 1a/1b
- Asynchronous/parallel evaluation
- zero/low additional latency
- eval for step
iwill arrive after executing stepi+1→ “wasted” resources+time on cyclei+1if evaluator decides to intervene
-
c. Add output fields to the executive prompt through which the agent communicates its qualitative & quantitative assessment of the previous step.
Quantitative output can be used directly to trigger an evaluation/intervention as in 2a, saving resources when this isn’t needed
- Possible addition to a. and b.: rewind the agent’s state to an earlier point in time and adjust its parameters if an approach proves unproductive. Â
💡 Note A component that evaluates the agent’s performance and provides it with feedback or other input is also in a good position to manage its directives.
-
-
Planning mechanism that controls the agent
-
a. Compose plan consisting of subtasks & employ the agent to complete subtasks
- i. Contextless (planning without knowledge of the agent’s abilities)
- ii. Contextful (planning with knowledge of the agent’s abilities) Â
Implementing planning outside the agent offers interesting possibilities, and may add value regardless of the agent’s own planning capabilities and mechanism. This could be a logical next step, towards multi-agent orchestration and/or tackling problems with arbitrary complexity.
-
Proposal
We propose a combination of solutions [2a] and [2c] based on the comparison above.
Why
- [2a] provides for a relatively simple implementation
- The additional latency of [2a] can be mitigated by using faster models. With narrowly scoped prompts, this shouldn’t be a problem.
- The average additional latency of [2a] is reduced by combining it with [2c], only running the evaluation/planning step when necessary
- [2c] doesn’t add much complexity to the existing executive prompt
To do
-
Work out:
- Once we have a list of subtasks, how do we make the agent execute it?
e.g. do we replace the agent’s
taskin the prompt by the current subtask? - Should subtasks be specified like actionables or like sub-objectives, or both? Does it make a difference?
- Once we have a list of subtasks, how do we make the agent execute it?
e.g. do we replace the agent’s
-
Sketch amended agent loop diagram
-
Prototype planning+evaluation prompt(s) (in a Jupyter notebook)
-
Implement [2a]
-
Work out:
- What are useful indicators for how a task is progressing?
-
"Is your approach to the problem working out?"
→ if not, evaluate & adjust plan
-
"Is this the last step needed to complete the [sub]task?"
→ if so, verify & move on to next subtask
-
- What are useful indicators for how a task is progressing?
-
Implement [2c]
- Add thought outputs corresponding to the indicators from [5.i] to
OneShotPromptStrategy(AKA the executive prompt) - Implement mechanism to only run planning/evaluation when needed
- Add thought outputs corresponding to the indicators from [5.i] to
Related
Earlier issues regarding improvements to tasking/planning:
- #305
- #3790
- #3937
- #3593
- #3644
- #3392
- #3549
This issue has been hijacked by a maintainer to replace+expand the solution proposal. Click here to see the original proposal by @Boostrix.
Â
BIFs are built-in functions (aka commands), weighted by a chat_with_ai() instance for the given task at hand and their utility, and then recursively evaluated by adding them to a queue: https://github.com/Significant-Gravitas/Auto-GPT/issues/3933#issuecomment-1538470999
Motivation 🔦
At the very least, even without getting fancy, we should be able to get the LLM to provide a list of tasks and move those into a queue to recursively call the LLM to provide sub-tasks and work those, pushing/popping as needed (possibly involving sub-stacks, which is kinda logical to do once you think about multiple agents: #3549)
This would need to take place with certain constraints:
- sequentially (for now)
- requirements
- constraints
- mutual dependencies
The following blog article goes into detail about our lack of planning: https://lorenzopieri.com/autogpt_fix/
Subgoaling, the creation of intermediate tasks to achieve the user defined goals, is crucial to task-directed AIs such as AutoGPT. If even just a single key subgoal is missing, the whole plan fails. The current architecture is very primitive here: just ask the LLM to break down the goal into subtasks! There is something poetic about the simplicity of this approach, which is basically “to get the solution, just ask the LLM the solution”. If you are familiar with the exploitation-exploration trade off, this is a case of going all-in on exploitation: the best plan should already be in the train data or very easy to extrapolate from it. This is rarely the case in interesting tasks, so we want to introduce the ability to explore different possible subgoals, the ability to plan. More on this later, now let’s focus on the creation of subgoals. Breaking down a goal into subgoals is an active area of research, spanning approaches such as Creative Problem Solving, Hierarchical Reinforcement Learning and Goal Reasoning. For instance using Creative Problem Solving we can tackle situations where the initial concept space (the LLM knowledge base) is insufficient to lay down all the subgoals. We can use Boden’s three levels of creativity: 1. Exploration of the universal conceptual space. 2. Combining concepts within the initial conceptual space or 3. Applying functions or transformations to the initial conceptual space.
This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.
Discussion
Before we can start managing the workflow of the agent, we have to give it some more structure. The different ways to implement this can generally be subdivided into 3 groups:
đź’¬ Your input is appreciated! If you think something important is missing from this comparison, please leave a comment below or bring it up on Discord!
Planning mechanism that is controlled by the agent
- a. Let the agent manage its plan and to-do list through commands
This approach seems non-ideal for a couple of reasons:
- Adds commands to the executive prompt, increasing its complexity
- Requires a full step for any self-management action → adds significant overhead
- b. Add (an) output field(s) to the executive prompt through which the agent can manage its plan / to-do list
More efficient than 1a, but unsure how well this will work:
- Does not require extra steps just for self-management → limited additional overhead
- This adds yet another function to the already complex prompt. Multi-purpose prompts are usually outperformed by a combination of multiple single-purpose prompts.
Planning mechanism that is part of the agent loop
a. Extra step in the agent’s loop to evaluate progress and update the plan / to-do list.
Good separation of concerns, but adding a sequential LLM-powered step to a loop will slow down the process.
- Additional step = additional latency in every cycle
- Separate unit with its own (focused) prompt → probably easier to tune compared to 1a/1b
b. Parallel thread that evaluates the agent’s performance and intervenes / gives feedback when needed.
A compromise, trading sequential (= simple to implement) evaluation for zero additional latency:
Separate unit with its own (focused) prompt → probably easier to tune compared to 1a/1b
Asynchronous/parallel evaluation
- zero/low additional latency
- eval for step
iwill arrive after executing stepi+1→ “wasted” resources+time on cyclei+1if evaluator decides to intervenec. Add output fields to the executive prompt through which the agent communicates its qualitative & quantitative assessment of the previous step.
Quantitative output can be used directly to trigger an evaluation/intervention as in 2a, saving resources when this isn’t needed
Possible addition to a. and b.: rewind the agent’s state to an earlier point in time and adjust its parameters if an approach proves unproductive.
💡 Note A component that evaluates the agent’s performance and provides it with feedback or other input is also in a good position to manage its directives.
Planning mechanism that controls the agent
a. Compose plan consisting of subtasks & employ the agent to complete subtasks
- i. Contextless (planning without knowledge of the agent’s abilities)
- ii. Contextful (planning with knowledge of the agent’s abilities)
Implementing planning outside the agent offers interesting possibilities, and may add value regardless of the agent’s own planning capabilities and mechanism. This could be a logical next step, towards multi-agent orchestration and/or tackling problems with arbitrary complexity.
Proposal
We propose a combination of solutions [2a] and [2c] based on the comparison above.
Why
- [2a] provides for a relatively simple implementation
- The additional latency of [2a] can be mitigated by using faster models. With narrowly scoped prompts, this shouldn’t be a problem.
- The average additional latency of [2a] is reduced by combining it with [2c], only running the evaluation/planning step when necessary
- [2c] doesn’t add much complexity to the existing executive prompt
To do
Work out:
- Once we have a list of subtasks, how do we make the agent execute it? e.g. do we replace the agent’s
taskin the prompt by the current subtask?- Should subtasks be specified like actionables or like sub-objectives, or both? Does it make a difference?
Sketch amended agent loop diagram
Prototype planning+evaluation prompt(s) (in a Jupyter notebook)
Implement [2a]
Work out:
What are useful indicators for how a task is progressing?
- "Is your approach to the problem working out?" → if not, evaluate & adjust plan
- "Is this the last step needed to complete the [sub]task?" → if so, verify & move on to next subtask
Implement [2c]
- Add thought outputs corresponding to the indicators from [5.i] to
OneShotPromptStrategy(AKA the executive prompt)- Implement mechanism to only run planning/evaluation when needed
Related
Earlier issues regarding improvements to tasking/planning:
- Step back periodically to asses approach #305
- Update Planner Interface #3790
- Adaptability Challenge : Dynamic prompting #3937
- Separation of execution and planning into different agents #3593
- Not aware of past command+arguments; often enters "forever loops" and repeats the same actions #3644
- Let's discuss : Multiple projects & Agent teaming #3392
- DRAFT : Project Abstraction Layer #3549
This issue has been hijacked by a maintainer to replace+expand the solution proposal. Click here to see the original proposal by @Boostrix.
Discussion
Before we can start managing the workflow of the agent, we have to give it some more structure. The different ways to implement this can generally be subdivided into 3 groups:
đź’¬ Your input is appreciated! If you think something important is missing from this comparison, please leave a comment below or bring it up on Discord!
Planning mechanism that is controlled by the agent
- a. Let the agent manage its plan and to-do list through commands
This approach seems non-ideal for a couple of reasons:
- Adds commands to the executive prompt, increasing its complexity
- Requires a full step for any self-management action → adds significant overhead
- b. Add (an) output field(s) to the executive prompt through which the agent can manage its plan / to-do list
More efficient than 1a, but unsure how well this will work:
- Does not require extra steps just for self-management → limited additional overhead
- This adds yet another function to the already complex prompt. Multi-purpose prompts are usually outperformed by a combination of multiple single-purpose prompts.
Planning mechanism that is part of the agent loop
a. Extra step in the agent’s loop to evaluate progress and update the plan / to-do list.
Good separation of concerns, but adding a sequential LLM-powered step to a loop will slow down the process.
- Additional step = additional latency in every cycle
- Separate unit with its own (focused) prompt → probably easier to tune compared to 1a/1b
b. Parallel thread that evaluates the agent’s performance and intervenes / gives feedback when needed.
A compromise, trading sequential (= simple to implement) evaluation for zero additional latency:
Separate unit with its own (focused) prompt → probably easier to tune compared to 1a/1b
Asynchronous/parallel evaluation
- zero/low additional latency
- eval for step
iwill arrive after executing stepi+1→ “wasted” resources+time on cyclei+1if evaluator decides to intervenec. Add output fields to the executive prompt through which the agent communicates its qualitative & quantitative assessment of the previous step.
Quantitative output can be used directly to trigger an evaluation/intervention as in 2a, saving resources when this isn’t needed
Possible addition to a. and b.: rewind the agent’s state to an earlier point in time and adjust its parameters if an approach proves unproductive.
💡 Note A component that evaluates the agent’s performance and provides it with feedback or other input is also in a good position to manage its directives.
Planning mechanism that controls the agent
a. Compose plan consisting of subtasks & employ the agent to complete subtasks
- i. Contextless (planning without knowledge of the agent’s abilities)
- ii. Contextful (planning with knowledge of the agent’s abilities)
Implementing planning outside the agent offers interesting possibilities, and may add value regardless of the agent’s own planning capabilities and mechanism. This could be a logical next step, towards multi-agent orchestration and/or tackling problems with arbitrary complexity.
Proposal
We propose a combination of solutions [2a] and [2c] based on the comparison above.
Why
- [2a] provides for a relatively simple implementation
- The additional latency of [2a] can be mitigated by using faster models. With narrowly scoped prompts, this shouldn’t be a problem.
- The average additional latency of [2a] is reduced by combining it with [2c], only running the evaluation/planning step when necessary
- [2c] doesn’t add much complexity to the existing executive prompt
To do
Work out:
- Once we have a list of subtasks, how do we make the agent execute it? e.g. do we replace the agent’s
taskin the prompt by the current subtask?- Should subtasks be specified like actionables or like sub-objectives, or both? Does it make a difference?
Sketch amended agent loop diagram
Prototype planning+evaluation prompt(s) (in a Jupyter notebook)
Implement [2a]
Work out:
What are useful indicators for how a task is progressing?
- "Is your approach to the problem working out?" → if not, evaluate & adjust plan
- "Is this the last step needed to complete the [sub]task?" → if so, verify & move on to next subtask
Implement [2c]
- Add thought outputs corresponding to the indicators from [5.i] to
OneShotPromptStrategy(AKA the executive prompt)- Implement mechanism to only run planning/evaluation when needed
Related
Earlier issues regarding improvements to tasking/planning:
- Step back periodically to asses approach #305
- Update Planner Interface #3790
- Adaptability Challenge : Dynamic prompting #3937
- Separation of execution and planning into different agents #3593
- Not aware of past command+arguments; often enters "forever loops" and repeats the same actions #3644
- Let's discuss : Multiple projects & Agent teaming #3392
- DRAFT : Project Abstraction Layer #3549
This issue has been hijacked by a maintainer to replace+expand the solution proposal. Click here to see the original proposal by @Boostrix.
nichebiche is love