AutoGPT icon indicating copy to clipboard operation
AutoGPT copied to clipboard

Semi-Active Mode

Open slavakurilyak opened this issue 1 year ago • 7 comments

Summary

I propose a new feature called "Semi-Active Mode," which enables the AI to run in a semi-automated manner, seeking user assistance when it encounters uncertainty, confusion, or ambiguity. This feature combines the benefits of Continuous Mode with the human-in-the-loop experience of Active Mode.

Background

Currently, there are two modes available:

  1. "Continuous Mode," which allows the AI to run without user authorization and is 100% automated. However, this mode is not recommended due to its potential dangers, such as running indefinitely or performing actions that users might not approve.
  2. "Active Mode," which enables the AI to run while actively prompting the user with chain-of-thought questions when executing each subsequent action. This allows users to actively participate while the AI agent runs, ensuring a human-in-the-loop experience.

To further enhance user engagement and provide a more flexible experience, I propose a new feature called "Semi-Active Mode."

Feature Description

In "Semi-Active Mode," the AI will:

  1. Execute an action.
  2. Evaluate its confidence in the action or result.
  3. If the confidence is below a certain threshold, prompt the user for assistance or clarification.
  4. Incorporate the user's input and continue to the next action.

This interaction pattern allows users to assist when needed while still benefiting from the AI's capabilities. It strikes a balance between full automation and active participation, fostering a collaborative environment between the user and the AI system.

Example Implementation

Here's an example implementation using LangChain's Human-As-A-Tool feature:

import sys
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.agents import load_tools, initialize_agent
from langchain.agents.agent_types import AgentType

llm = ChatOpenAI(temperature=0.0)
math_llm = OpenAI(temperature=0.0)
tools = load_tools(
    ["human", "llm-math"], 
    llm=math_llm,
)

agent_chain = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
)

agent_chain.run("What is Eric Zhu's birthday?")

In this code, the AI agent seeks human assistance when it encounters uncertainty, allowing the user to guide as needed.

Benefits

  • Enhanced user engagement
  • Reduced risk of AI performing unwanted actions
  • Increased collaboration between the user and the AI
  • Balances automation and user control

Risks and Mitigations

This feature may slow down the overall AI operation due to the need for user input in certain situations. However, this trade-off is acceptable, considering the increased control and collaboration it provides.

Request for Comments

I would appreciate feedback from the community on this suggested feature. Please share your thoughts, suggestions, and any potential concerns you may have.

slavakurilyak avatar Apr 04 '23 05:04 slavakurilyak

we're going to need some prompt engineering to ask the ai about its confidence. in the main prompt file we can add: 2. Constructively self-criticize your big-picture behavior constantly and evaluate your confidence level.

{
    "command": {
        "name": "command name",
        "args":{
            "arg name": "value"
        }
    },
    "thoughts":
    {
        "text": "thought",
        "reasoning": "reasoning",
        "plan": "- short bulleted\n- list that conveys\n- long-term plan",
        "criticism": "constructive self-criticism",
        "speak": "thoughts summary to say to user",
        "confidence": "0-100 confidence level rating"
    }
}

Please provide better prompt options that might cost less token cc @Torantulino

Remark: this mode can enrich the continuous mode too: if the AI is not confident, maybe it can retry the same action. I would say this should probably be a separate continuous mode though (continuous-with-reflection?) because this is going to cost more tokens.

waynehamadi avatar Apr 04 '23 06:04 waynehamadi

Some time ago I read a paper discussing AI control and compliance. (I can't find this paper right now). It was proposing an "approval seeking" mode for AI. where it works autonomously on goals it clearly understands and knows how to achieve. When there is a case for ambiguity, it seeks for a superior (human?) guidance. There are 2 types of engagement

  1. ask for next step (what should I do next)
  2. ask for permission (this is what I think I can do, may I? )

..I put this comment just for context...

profintegra avatar Apr 04 '23 10:04 profintegra

The best experience with reports is when they can read all the source materials and try to figure it out themselves, but still come and present an awesome, intelligent summary to me and want to discuss their plans to check if they match my understanding. Typically we discuss until it's clear we're in agreement that this is a reasonable and valuable course of action.

sberney avatar Apr 06 '23 05:04 sberney

I would love to collaborate on this and have been working on a framework for classifying the risk and type of AI tasks for proper delegation:

The TACTIC framework provides a comprehensive approach to managing AI commands, covering a wide range of risks and approval levels. By implementing this tiered structure, we can maintain control over AI-driven processes while maximizing efficiency, security, and accountability.

Tier 1: Transparent (Low risk, read-only tasks) Examples: Browsing the web, searching for information, reading documents. Approval: Automatic or other AI bots.

Tier 2: Assisted (Low to medium risk, write-access tasks) Examples: Drafting and updating documents, spreadsheets, presentations. Approval: Low-level human intervention, such as a delegated assistant.

Tier 3: Collaborative (Medium risk, communication tasks) Examples: Sending emails, making phone calls, scheduling meetings. Approval: Mid-level human intervention, such as a designated supervisor.

Tier 4: Transactional (Medium to high risk, financial tasks) Examples: Using paid APIs, ordering items, making purchases. Approval: High-level human intervention, such as a manager or financial officer.

Tier 5: Intimate (High risk, sensitive tasks) Examples: Accessing sensitive data, making critical decisions, handling confidential information. Approval: Exclusive to the user/owner themselves.

Tier 6: Critical (Extremely high risk, irreversible or high-impact tasks) Examples: Initiating legal actions, making large financial investments, approving strategic partnerships. Approval: Highest level of human intervention, such as a board of directors or executive committee.

I think we need a hybrid approach to classifying commands that combines AI with human involvement to provide a more reliable solution. So initial classification by GPT but a human-in-the-loop review process, especially for tasks that fall under higher risk categories. This step ensures that the AI categorization is accurate and relevant, providing an additional layer of validation.

marktsears avatar Apr 24 '23 02:04 marktsears

See also: https://github.com/Significant-Gravitas/Auto-GPT/issues/3396#issuecomment-1529504806

Boostrix avatar May 01 '23 19:05 Boostrix

One thing I think myself and a lot of others have encountered is wanting to "step in" to continuous mode, make some instructions, and turn it back on. Same thing for choosing some number of automated steps, being able to choose like 50, see something going wrong, give some guidance and turn back over control.

montanaflynn avatar May 03 '23 01:05 montanaflynn

See also: #1548

thinking out loud: this might be easier than we think using the agent messaging API - we really only need to support some form of keyboard handling at the top-level that then messages the underlying agent (master) to give some instructions or change the number of automated steps to resume the agent afterwards.

And indeed, under the hood, agents need basically the exact same functionality anyway to be able to mutually influence themselves by having parent agents "guide" sub-agents. Thus, we could just as well be using the same mechanism for the top-level/outer-most agent, to let the human interactively control/guide the agent.

Python's keyboard module probably has most of the building blocks in place to register an event handler that triggers this chain of events (?)

Please help take a look at the following PR #3083, hopefully this can then be closed or made more specific ?

Boostrix avatar May 03 '23 05:05 Boostrix

ask_user command was added in #5077

Pwuts avatar Sep 09 '23 02:09 Pwuts

ask_user command was added in #5077

IIRC, the original idea was to make this support the "outer agent" - i.e normally the user, but could just as well be an agent, is this now supported ?

Boostrix avatar Sep 28 '23 14:09 Boostrix

Hi guys, I am working on this request_assistance feature github.com/jmikedupont2/ai-ticket

jmikedupont2 avatar Oct 03 '23 11:10 jmikedupont2

Unless I am mistaken, this got implemented a while ago ? Please do join us on Discord to discuss this first before working on this any longer

Boostrix avatar Oct 04 '23 18:10 Boostrix

I am working on an extension of this idea to go much futher, i was commenting here to mark this thread for later review.

jmikedupont2 avatar Oct 05 '23 12:10 jmikedupont2