AutoGPT GPT-3 often uses commands incorrectly

⚠️ Search for existing issues first ⚠️

[X] I have searched the existing issues, and there is no existing issue for my problem

Which Operating System are you using?

MacOS

Which version of Auto-GPT are you using?

Latest Release

GPT-3 or GPT-4?

GPT-3.5

Steps to reproduce 🕹

My goal for Auto-GPT is: Make an algorithm for making normal map using 4 images of different light angles

Current behavior 😯

截圖 2023-04-23 下午2 48 01 It just duplicated the text in system prompt: {'repository_url': '', 'clone_path': ''}

Expected behavior 🤔

It should use an existing url and path to replace the text.

Your prompt 📝

Make an algorithm for making normal map using 4 images of different light angles

Your Logs 📒

2023-04-16 16:58:33,797 INFO Welcome to Auto-GPT!  Enter the name of your AI and its role below. Entering nothing will load defaults.
2023-04-16 16:58:34,292 INFO Name your AI:  For example, 'Entrepreneur-GPT'
2023-04-16 16:58:44,791 INFO Agent-GPT here! I am at your service.
2023-04-16 16:58:45,063 INFO Describe your AI's role:  For example, 'an AI designed to autonomously develop and run businesses with the sole goal of increasing your net worth.'
2023-04-16 16:59:01,039 INFO Enter up to 5 goals for your AI:  For example: 
Increase net worth, Grow Twitter Account, Develop and manage multiple businesses autonomously'
2023-04-16 16:59:06,890 INFO Using memory of type: LocalCache
2023-04-16 16:59:07,033 INFO Using Browser: chrome
2023-04-23 14:14:48,791 INFO Welcome to Auto-GPT!  Enter the name of your AI and its role below. Entering nothing will load defaults.
2023-04-23 14:14:49,222 INFO Name your AI:  For example, 'Entrepreneur-GPT'
2023-04-23 14:15:33,263 INFO GPT here! I am at your service.
2023-04-23 14:15:33,521 INFO Describe your AI's role:  For example, 'an AI designed to autonomously develop and run businesses with the sole goal of increasing your net worth.'
2023-04-23 14:15:47,623 INFO Enter up to 5 goals for your AI:  For example: 
Increase net worth, Grow Twitter Account, Develop and manage multiple businesses autonomously'
2023-04-23 14:19:40,884 INFO Using memory of type: NoMemory
2023-04-23 14:19:41,027 INFO Using Browser: chrome
2023-04-23 14:20:39,416 INFO  THOUGHTS: Based on the goal and constraints provided, the next command to use would be 'Generate Image' to create a normal map using 4 images of different light angles. I can use an algorithm to process the images and generate the normal map. I will need to research and analyze existing algorithms and techniques for normal map generation from multiple light angles. I should also consider the limitations of my short-term memory and save important information to files to avoid losing any critical data.
2023-04-23 14:20:40,109 INFO REASONING: The goal is to create a normal map using 4 images of different light angles, which requires an algorithm for image processing. The constraints state that I should avoid legal complications and not seek user assistance, and I should also be mindful of my short-term memory limitations. Thus, using an algorithm to generate the normal map is the most efficient and independent approach.
2023-04-23 14:20:40,746 INFO PLAN: 
2023-04-23 14:20:40,827 INFO -  Research and analyze existing algorithms and techniques for normal map generation from multiple light angles
2023-04-23 14:20:41,205 INFO -  Implement the algorithm for image processing to generate the normal map
2023-04-23 14:20:41,602 INFO -  Save important information to files to avoid losing critical data
2023-04-23 14:20:41,916 INFO -  Continuously review and analyze my actions to ensure performance optimization
2023-04-23 14:20:42,225 INFO CRITICISM: I should be mindful of my short-term memory limitations and avoid relying solely on it. It's important to save important information to files to avoid losing any critical data. I should also ensure that the algorithm I use for image processing is efficient and effective in generating accurate normal maps.
2023-04-23 14:20:42,868 INFO Attempting to fix JSON by finding outermost brackets
 
2023-04-23 14:20:43,107 INFO Apparently json was fixed. 
2023-04-23 14:20:43,280 INFO NEXT ACTION:  COMMAND = generate_image  ARGUMENTS = {'prompt': 'Use algorithm to generate normal map from 4 images of different light angles'}
2023-04-23 14:22:15,212 INFO SYSTEM:  Human feedback: I already have these images, so you don't need to generate images
2023-04-23 14:23:20,920 WARNING  Warning: Failed to parse AI output, attempting to fix.
 If you see this warning frequently, it's likely that your prompt is confusing the AI. Try changing it up slightly.
2023-04-23 14:30:55,157 INFO Welcome to Auto-GPT!  Enter the name of your AI and its role below. Entering nothing will load defaults.
2023-04-23 14:30:55,559 INFO Name your AI:  For example, 'Entrepreneur-GPT'
2023-04-23 14:31:15,816 INFO GPT here! I am at your service.
2023-04-23 14:31:16,041 INFO Describe your AI's role:  For example, 'an AI designed to autonomously develop and run businesses with the sole goal of increasing your net worth.'
2023-04-23 14:31:20,343 INFO Enter up to 5 goals for your AI:  For example: 
Increase net worth, Grow Twitter Account, Develop and manage multiple businesses autonomously'
2023-04-23 14:31:37,338 INFO Using memory of type: NoMemory
2023-04-23 14:31:37,484 INFO Using Browser: chrome
2023-04-23 14:32:31,135 INFO  THOUGHTS: Based on the goals and constraints provided, the most relevant command to use for the given task of creating an algorithm for making normal maps using 4 images of different light angles would be to evaluate code.
2023-04-23 14:32:31,814 INFO REASONING: Evaluating code would allow me to review and analyze the code for creating normal maps and provide suggestions for improvement. This would be a suitable approach to ensure the algorithm is efficient and effective.
2023-04-23 14:32:32,403 INFO PLAN: 
2023-04-23 14:32:32,476 INFO -  Evaluate the code for creating normal maps.
2023-04-23 14:32:32,772 INFO -  Review and analyze the code for efficiency and effectiveness.
2023-04-23 14:32:33,071 INFO -  Provide suggestions for improvement.
2023-04-23 14:32:33,231 INFO -  Ensure the algorithm meets the goal of creating normal maps using 4 images of different light angles.
2023-04-23 14:32:33,612 INFO CRITICISM: I need to ensure that the code provided for evaluation is within the scope of my capabilities and that I do not exceed the constraint of 4000-word limit for short-term memory.
2023-04-23 14:32:34,251 INFO Attempting to fix JSON by finding outermost brackets
 
2023-04-23 14:32:34,543 INFO Apparently json was fixed. 
2023-04-23 14:32:34,705 INFO NEXT ACTION:  COMMAND = evaluate_code  ARGUMENTS = {'code': '<full_code_string>'}
2023-04-23 14:34:34,594 INFO -=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= 
2023-04-23 14:34:44,361 INFO SYSTEM:  Command evaluate_code returned: You can replace 'Suggestion 1', 'Suggestion 2', 'Suggestion 3', etc. with actual suggestions for improving the given code. These suggestions could include optimizing performance, improving readability, adhering to coding conventions, and addressing potential bugs or errors.

2023-04-23 14:39:42,133 INFO Welcome to Auto-GPT!  Enter the name of your AI and its role below. Entering nothing will load defaults.
2023-04-23 14:39:42,587 INFO Name your AI:  For example, 'Entrepreneur-GPT'
2023-04-23 14:40:15,988 INFO GPT here! I am at your service.
2023-04-23 14:40:16,261 INFO Describe your AI's role:  For example, 'an AI designed to autonomously develop and run businesses with the sole goal of increasing your net worth.'
2023-04-23 14:40:40,378 INFO Enter up to 5 goals for your AI:  For example: 
Increase net worth, Grow Twitter Account, Develop and manage multiple businesses autonomously'
2023-04-23 14:41:03,495 INFO Using memory of type: NoMemory
2023-04-23 14:41:03,703 INFO Using Browser: chrome
2023-04-23 14:41:59,204 INFO  THOUGHTS: Based on the given goals and constraints, the next logical command to use would be to clone a repository in order to access any necessary code or resources for creating the algorithm for making normal maps using 4 images of different light angles.
2023-04-23 14:41:59,967 INFO REASONING: Cloning a repository would allow me to access any existing code or resources that may be useful for creating the normal map algorithm. This would be an efficient way to gather relevant information and potentially speed up the development process.
2023-04-23 14:42:00,544 INFO PLAN: 
2023-04-23 14:42:00,625 INFO -  Clone the repository using the 'clone_repository' command.
2023-04-23 14:42:00,868 INFO -  Review the code and resources available in the cloned repository.
2023-04-23 14:42:01,138 INFO -  Utilize the information from the repository to develop the normal map algorithm.
2023-04-23 14:42:01,521 INFO -  Save any important information or code to files for future reference.
2023-04-23 14:42:01,796 INFO CRITICISM: I need to ensure that the repository I clone is reliable and trustworthy, and that any code or resources I use are properly cited and comply with applicable licenses or regulations.
2023-04-23 14:42:02,417 INFO Attempting to fix JSON by finding outermost brackets
 
2023-04-23 14:42:02,669 INFO Apparently json was fixed. 
2023-04-23 14:42:02,823 INFO NEXT ACTION:  COMMAND = clone_repository  ARGUMENTS = {'repository_url': '<url>', 'clone_path': '<directory>'}
2023-04-23 14:43:40,205 INFO -=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= 
2023-04-23 14:43:40,391 INFO SYSTEM:  Command clone_repository returned: Error: Cmd('git') failed due to: exit code(128)
  cmdline: git clone -v -- <url> <directory>
  stderr: 'fatal: repository '<url>' does not exist
'
2023-04-23 14:44:54,111 INFO  THOUGHTS: Based on the given goals and constraints, the next logical command to use would be to clone a repository in order to access any necessary code or resources for creating the algorithm for making normal maps using 4 images of different light angles.
2023-04-23 14:44:54,771 INFO REASONING: Cloning a repository would allow me to access any existing code or resources that may be useful for creating the normal map algorithm. This would be an efficient way to gather relevant information and potentially speed up the development process.
2023-04-23 14:44:55,402 INFO PLAN: 
2023-04-23 14:44:55,497 INFO -  Clone the repository using the 'clone_repository' command.
2023-04-23 14:44:55,773 INFO -  Review the code and resources available in the cloned repository.
2023-04-23 14:44:56,103 INFO -  Utilize the information from the repository to develop the normal map algorithm.
2023-04-23 14:44:56,428 INFO -  Save any important information or code to files for future reference.
2023-04-23 14:44:56,827 INFO CRITICISM: I need to ensure that the repository I clone is reliable and trustworthy, and that any code or resources I use are properly cited and comply with applicable licenses or regulations.
2023-04-23 14:44:57,388 INFO Attempting to fix JSON by finding outermost brackets
 
2023-04-23 14:44:57,638 INFO Apparently json was fixed. 
2023-04-23 14:44:57,796 INFO NEXT ACTION:  COMMAND = clone_repository  ARGUMENTS = {'repository_url': '<url>', 'clone_path': '<directory>'}
2023-04-23 14:46:06,951 INFO SYSTEM:  Human feedback: you need to use actual url instead of <url>
2023-04-23 14:47:32,996 INFO  THOUGHTS: Based on the given goals and constraints, the next logical command to use would be to clone a repository in order to access any necessary code or resources for creating the algorithm for making normal maps using 4 images of different light angles.
2023-04-23 14:47:33,614 INFO REASONING: Cloning a repository would allow me to access any existing code or resources that may be useful for creating the normal map algorithm. This would be an efficient way to gather relevant information and potentially speed up the development process.
2023-04-23 14:47:34,247 INFO PLAN: 
2023-04-23 14:47:34,295 INFO -  Clone the repository using the 'clone_repository' command.
2023-04-23 14:47:34,559 INFO -  Review the code and resources available in the cloned repository.
2023-04-23 14:47:34,846 INFO -  Utilize the information from the repository to develop the normal map algorithm.
2023-04-23 14:47:35,274 INFO -  Save any important information or code to files for future reference.
2023-04-23 14:47:35,606 INFO CRITICISM: I need to ensure that the repository I clone is reliable and trustworthy, and that any code or resources I use are properly cited and comply with applicable licenses or regulations.
2023-04-23 14:47:36,182 INFO Attempting to fix JSON by finding outermost brackets
 
2023-04-23 14:47:36,459 INFO Apparently json was fixed. 
2023-04-23 14:47:36,619 INFO NEXT ACTION:  COMMAND = clone_repository  ARGUMENTS = {'repository_url': '<url>', 'clone_path': '<directory>'}

Apr 23 '23 07:04 max-hsieh

FWIW, I also ran into this issue - so here's my 2c:

First, I simply expanded the add_constraints section at the top of the prompt generator for executing shell commands, while all constraints were successfully added, that didn't make much of a difference - and once you think about it, it makes perfect sense: these constraints are used in every (?) prompt, right ?

So, what I tried next worked MUCH better: modify the shell_execute command to take into account surrounding context.

The first step, on a linux system, was to always execute lsb_release -a to show it what distro is available.

Next, I only told it to always run the equivalent of --version before running the actual command, to ensure that the LLM has got proper context - that fixed up the majority of issues, but got quickly a little redundant.

I then changed the constraint to use a local "shell-env.json" file to always dump the name/version number of shell tools into the file and look there first.

The key difference really is adding these constraints CONDITIONALLY, i.e. only when you are actually wanting to execute a shell command - so, these should not be added per se, you really only want those to be added directly before executing shell commands. I added a new stage which I called "plan_shell_invocation", which would then invoke lsb_release and run/serialize --version into the json file.

The key takeaway here being that we cannot simply add a bunch of constraints to the prompt generator, rather we need to use specific constraints for different types of "actions" (command execution really being just one version of an action).

For testing purposes, you can simply edit prompt.py and modify the constraint/description of the shell_execute command:

    # Only add shell command to the prompt if the AI is allowed to execute it
    if cfg.execute_local_commands:
        commands.append(
            (
                "Execute Shell Command, non-interactive commands only (NOTE: First check installed version number)",
                "execute_shell",
                {"command_line": "<command_line>"},
            ),
        )

EDIT: It was only after the fact, that I learnt that I had just re-invented the wheel, so posting my feedback there instead: https://github.com/hdkiller/Auto-GPT-SystemInfo/issues/4

Apr 30 '23 06:04 Boostrix

FWIW, I've been running into more issues relating to shell command execution, and as I mentioned elsewhere, the majority of these can be solved by adding custom constraints/performance evaluations surrounding the execution of a shell command, however that's gobbling up gobal context - whereas it would make more sense to support context-specific constraints instead, because these are primarily needed before executing a shell command. Basically, some constraints/performance evaluations are more/only relevant in a given context - such as command execution.

This also isn't solved by the system info plugin mentioned before. Another user mentioned wanting to use this in conjunction with ssh (#3420), so there probably is a real need to extend this feature and introduce a dedicated helper class to manage execution of shell commands.

This helper class (or plugin!) would ideally provide an option to "prepare" execution of shell commands to provide suffiicent context to the LLM, before picking an actual command.

hostname/IP and credentials if necessary (for use over telnet/ssh)
gather info about OS/platform (lsb_release -a)
should know what shell/version is used by default
gather availability of tools/alternatives (which on *nix, think looking for curl, wget, lynx or telnet to fetch something over http)
gather version information for tools available (--version, this is necessary because some versions don't support certain args)
explicitly support setting the CWD for these commands (this often fails when using nested tasks/agents)
serialize this stuff to a JSON file or use short term memory for persistence ?
support a blacklist/whitelist of shell commands (would also solve the issue of vim/nano etc being executed)
provide a command history buffer/MRU list to keep track of previously executed commands that were successful as per: https://github.com/Significant-Gravitas/Auto-GPT/issues/3320#issuecomment-1531104200

The MRU/LRU thingy is probably more useful than one might think - for instance, for the time being, the system will even prefer built-in commands like generate_image, despite them being unavailable (no image provider). Obviously, that could be solved just for this particular command - but the more intelligent solution would be to remember that something simply didn't work, and remove it from the solution/option space (or at least downgrade its popularity).

And in fact, sorting the list of options/prompts according to relevance might make sense either way, at the very least we could provide a relevance rating given the current context and list that next to each option

May 02 '23 09:05 Boostrix

Good idea. This “prepare for execution” script should be automatically executed by all the plugins and commands, no need to create a specific plugin for this. Just the system info plugin make sense but only to get a diagnostic report to the user if he asks the AI. Just be sure to make this cross platform (via Python maybe?).

May 02 '23 11:05 Emasoft

Regarding the MRU, having a history of previously executed shell commands that worked, would mean that the pool of potential solutions/candidates could be traversed with preference. This "history" could also contain a textual description of what the command did to provide surrounding context for the LLM to pick an adequate candidate. For starters, this history would merely contain path/file name locations, subsequently it would contain --version info and from then on it would contain actual examples of using the shell command (think a vector of examples).

On a Linux based system, we could even expect to be able to run --help --verbose for these commands to obtain additional contextual information or run $ zcat /usr/share/info/$TOOL.info.gz to obtain a ton of locally available info (probably more relevant for locally running LLMs)

May 02 '23 11:05 Boostrix

This only just occurred to me, but I keep seeing several instances where the LLM is trying to execute commands without substituting its default arguments (hints) - so that would be one of the lowest hanging fruits from a heuristics perspective to detect that something is about to haywire, because of the original description of the command is to be found in its execution, the LLM failed big time.

May 04 '23 19:05 Boostrix

I have been experimenting with AutoGPT and trying to make it run in the wild by providing a sandboxed environment where I have no worries about it doing something really crazy or dumb. I have added some of my observations and feedback on the issues over here.

After going through the comments on this thread, I see few common points being talked about:

Provide information about the OS to AutoGPT
Providing information about hostnames etc for ssh related commands
The tags of this issue make it seem that it's a model issue over an issue in the tool
Modifying the explanation of the function given to GPT can improve the outputs

All these are good points, but there is one major missing aspect that inhibits AutoGPT from doing any non-trivial computation. And this aspect is "Program-State" / "Shell-State". The way AutoGPT works currently is it will call OAI, get the Plans/action sequences and then execute them in isolation. For example, if it executes the cd abc in one command, then the next command will not be running from inside the abc folder even if the model is thinking it is inside the abc folder based on the plan and the execution logs. The main reason behind this is that we execute shell commands in a subprocess where the computation is "stateless". Whatever command you run, the following command will not run in the exact environment where the previous command finished execution. This is a serious limitation because even if you add any amount of "prompt magic", you won't be able to circumvent this issue since the model will always have imperfect information upon which it devises the plans and chooses its actions.

Another limitation with AutoGPT is that to run Python , it will run the code it wrote in a separate docker container. This is not good since there is a disconnect between the current shell state and the state of the docker container. For example :

Current file path may be different from what it is expecting
environment variables may be different from what it's expecting
Access to binaries may be completely different between the two environments.

How can we fix it?

Providing a state-full shell to AutoGPT

The most straightforward idea is to have a shell terminal-like object in AutoGPT, which AutoGPT interacts with. This object is stateful, so it runs any "stateful" commands that alter the shell's state, then the subsequent command will resume the execution from the same state. For example, if auto-gpt changed directories and downloaded some binary in the next command, then the binary should be present in the directory it cd'd into in the previous command. I have highlighted one strategy in a discussion thread and would be glad to discuss more ideas on that front.

Integrating Shell State in the Prompt

This is now a hunch and is untested, so take it with a grain of salt

The prompt should contain the current state of the shell, such as information about its:

cwd / files in cwd
running-process
environment variables (with some bits of obfuscation as deemed necessary) (We shouldn't leak secrets to OAI or any other LLM backend provider)
System CPU/Memory information

Having this information outside the outputs of the commands it executes could help with decision-making since a lot of this information is related to what it should know (like any good programer) about its current shell-state.

May 06 '23 00:05 valayDave

again, that's really very thoughtful and helpful, I suggested to come up with a draft PR to collate related info there, so that whoever revisits the command system, can take a look at these comments: https://github.com/Significant-Gravitas/Auto-GPT/pull/3824#issuecomment-1537055503

May 06 '23 05:05 Boostrix

According to my own experience dealing with/learning auto-gpt for the last 4-5 weeks, it all comes down to context. If the STM doesn't contain enough context it doesn't matter what you tell it to do. It will try one command after another blindly even if you tell if that this and that command didn't work. If however enough context is retained in STM autogpt can figure out quite quickly what it did wrong and find the correct way forward.

May 26 '23 02:05 W3Warp

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

Sep 06 '23 21:09 github-actions[bot]

This issue was closed automatically because it has been stale for 10 days with no activity.

Sep 18 '23 01:09 github-actions[bot]

AutoGPT AutoGPT copied to clipboard

GPT-3 often uses commands incorrectly

⚠️ Search for existing issues first ⚠️

Which Operating System are you using?

Which version of Auto-GPT are you using?

GPT-3 or GPT-4?

Steps to reproduce 🕹

Current behavior 😯

Expected behavior 🤔

Your prompt 📝

Your Logs 📒

How can we fix it?

Providing a state-full shell to AutoGPT

Integrating Shell State in the Prompt

AutoGPT
AutoGPT copied to clipboard