AutoGPT
AutoGPT copied to clipboard
[DRAFT] Wizard mode (placeholder for #3820 )
!! DO NOT MERGE !! This is just to gather feedback and interest, and to inform people interested in similar stuff. Based on the following idea: https://github.com/Significant-Gravitas/Auto-GPT/issues/3820 And inspired by this nifty example https://github.com/Significant-Gravitas/Auto-GPT/issues/2775#issuecomment-1517703255 created by @adam-paterson
Currently, the Auto-GPT project is being re-architected, and that also involves the plugin system. Thus, not a good time to start writing a plugin or making changes to the system as a whole. So, this is primarily to encourage a public brainstorming with code snippets.
Motivation: https://github.com/Significant-Gravitas/Auto-GPT/issues/3820#issuecomment-1535732259
It is possible that the wizard framework will end up being a separate plugin/repo - but for now this is just to gather interest and feedback, to see if it's worth the hassle of coding this up or simply not needed.
I started experimenting with a dedicated new command to dynamically set up "wizard" workflows in json and then traversing those workflows using a "prompter" (CLI or GUI). This is then able to be invoked by the agent, but can also invoke the LLM with inputs - it's basically a handful of new commands to expose this to the LLM, so that it can dynamically set up workflows (in json) and afterwards execute those (possibly in a repeated fashion)
This is all pretty standard/simple stuff, no fancy code at all:
#3911
The generated HTML page looks then like this:
Also, recursion being recursion, I asked the agent to generate a wizard to generate a new UI based based on the same idea:
#765
Use Cases
We could come up with custom wizards for all sorts of purposes, such as configuring Auto-GPT itself, but also related workflows (like challenges, unit tests etc):
- could be used to procedurally generate a ton of test cases/benchmarks
- would allow people to share their workflows/processes with others
- #3835
- #3883
- etc
Background
In #3820 the idea was to provide an interactive wizard for configuration purposes, to help with creating/maintaining various config files like ai_settings.yaml and the env file. In a (very limited) sense, the initial prompt asking a few questions to create ai_settings.yaml is already a simple "wizard" (a few questions that end up affecting a file on disk).
Subsequently, we talked about generalizing on the concept to introduce a dedicated mechanism to formalize "workflows" using wizards. These would by default by serialized/stored in JSON format, so people could easily share their wizard/workflow configs.
Furthermore, these wizards would end up using the agent/command system - so they would be interactive shell scripts on steroids. Basically asking a bunch of questions and walking the user (or the agent!) through a process/workflow to accomplish a certain goal.
In theory, a finished wizard config could also be registered as a command option for the agent, so that other agents could also use the same mechanism (then using the inter-agent messaging API).
For now however, the primary use case would be dynamically generate wizards for certain workflows using the agent, and then sharing those with other users so that they can extend the capabilities of their system, without it having to call CLI commands and without the system having to write/execute Python code dyamically.
In simple terms, a wizard would consist of "pages" (imagine a list of questions with associated validation routines), where each page would consist of "elements" (passive ones like text/labels or active ones requiring input). The data structure here would be a linked list to chain arbitrary sets of inputs together, but in between each step (page), there would be the option to invoke validation routines and trigger the agent to process the input.
For the time being, just imagine the existing "wizard" to generate the ai_settings.yaml file - but with more hooks, to also affect the env file, or provide an option to write the new config to disk (via the equivalent of showing/running the corresponding sudo command)
However, the use-case would not be restricted to just setting up the Auto-GPT install, since the same mechanism could be used to formalize workflows internally - whenever a workflow has a well-defined set of questions (think location, file name, file type) a wizard could optionally gather, and validate, the corresponding data. And it could do so in an agent-agnostic fashion, i.e. by offering the same mechanism to sub-agents.
A potential use-case would be a website builder - currently, Auto-GPT can definitely build websites, but basically by experimenting a lot and doing a lot of research in parallel. A wizard-based approach could streamline/formalize such workflows and ask a bunch of questions, so that also non-coders could be walked through the process - despite potentially lacking the coding lingo to steer the LLM into the right direction.
Thus, consider a "wizard" like a pre-configured option to prime the LLM in a well-defined manner to arrive at a desired output. In principle, the idea here is to provide a mechanism so that people can more easily share working "workflows", without sharing their exact "projects" - so rather than having to copy/paste and adapt an existing working project, running a wizard would allow people to simply fill in THEIR custom details to generate the boilerplate of a project (think code/web project etc)
These "wizard configs" would by convention by required to be versioned JSON and could also be maintained in a repo to easily maintain/update wizards.
The other thing worth keeping in mind, that under the hood, the whole promptgenerator/prompt approach is already using a wizard based setup anyway - however, given the general (and open-ended nature) of the command options offered to the LLM, there's often combinatorial explosion involved when picking a certain strategy to arrive at a certain goal.
A wizard mode however could constrain these options by allowing the LLM (or sub-agent) to "enter" a new mode, with fewer options being presented, and with more relevant being presented - including contextual information that is better suited for the given use-case, because a special wizard would by definition know about the underlying workflow than the current hard-coded command system can possibly know.
Accordingly, wizards might invoke themselves mutually or call sub-wizards to obtain some data and trigger a certain workflow.
feedback/thoughts or anybody interested in fleshing out the details ?
(more to follow, need to copy some comments out of #3820)
Goals
- improve usability for the CLI use-case, see: #1412
- better streamline workflows and processes, validate inputs (=less thinking, i.e. API/token costs)
- allow wizards to be used for config file customization
- allow wizard to be executed via a startup argument
- allow different front-ends to be used in conjunction with wizards (see the ongoing web UI/REST effort)
- this includes the ability for agents to make use of wizards (recursively), i.e. treat them like conventional commands internally - with the difference being that a wizard is a "mode" and as such offers multiple steps to interactively select between options
Changes
WIP (thinking out loud)
- at least one additional lib to use an existing (headless) wizard framework
- a corresponding WizardAdapter class to be able to tinker with different frameworks/libs
- a new Wizard class including sub-classes for steps/pages (collection of prompts that belong together) and prompt specific validation
- a custom PromptGenerator instance for the use-case at hand
- a custom Command Manager instance to only provide wizard related command options (and one option to leave the wizard mode)
- for starters, probably just a new set of files directly integrated into the code base to illustrate the concept
- a template engine to easily support templates for wizards/pages and steps (should support inheritance)
Commands
- new_wizard - create new wizards (again, might use the wizard API itself!)
- list_wizards - analogous to command, a list of wizards including textual description
- start_wizard - name of the wizard to be started and an optional context param
- exit_wizard - only provided when in wizard mode, to leave the wizard mode
Thoughts
To use this in the context of creating/updating config files like yaml/env files, we will need to support a special startup mode, so that auto-gpt has access to a dedicated command to write those files outside the workspace and/or simply show instructions to do so, or optionally execute the equivalent of "sudo" once everything is finished.
Documentation
not relevant at this point, just gathering feedback and wanting to inform people interested in similar functionality
Test Plan
not relevant at this point, just gathering feedback and wanting to inform people interested in similar functionality
PR Quality Checklist
- [ ] My pull request is atomic and focuses on a single change.
- [ ] I have thoroughly tested my changes with multiple different prompts.
- [ ] I have considered potential risks and mitigations for my changes.
- [ ] I have documented my changes clearly and comprehensively.
- [ ] I have not snuck in any "extra" small tweaks changes
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Ignored Deployment
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| docs | ⬜️ Ignored (Inspect) | May 11, 2023 6:19am |
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
I feel like access to python instance of Agent could suffice. At first stage. I thought that you might want to either have all the flow in ipython, or sometimes go to ipython for things..
I feel like access to python instance of Agent could suffice. At first stage.
right now, it's just a set of standalone files to illustrate the idea of running a wizard to customize a template and generate new output that way. There would be the equivalent of a WizardManager (not unlike the Command manager) to provide wizard specific commands so that people can list/run and create/edit wizards using the equivalent of the command system.
After each "step" (collating info from the user/agent), the wizard would pass control back to the LLM to offer a list of potential option for the next step, after which the next "page" (conceptually at least) of the wizard will be executed, collecting/validating more info.
The idea is for sub-agents also to be able to execute such wizards and gather information from the parent agent for instance, using the agent messaging interface.
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
Codecov Report
Patch coverage has no change and project coverage change: +0.07 :tada:
Comparison is base (
d184d0d) 60.92% compared to head (4d3634f) 60.99%.
Additional details and impacted files
@@ Coverage Diff @@
## master #3911 +/- ##
==========================================
+ Coverage 60.92% 60.99% +0.07%
==========================================
Files 72 73 +1
Lines 3304 3310 +6
Branches 542 542
==========================================
+ Hits 2013 2019 +6
Misses 1152 1152
Partials 139 139
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
Status? :)
Didn't seem to get much traction / feedback at the time - like many other PRs 🙃