AutoGPT icon indicating copy to clipboard operation
AutoGPT copied to clipboard

Failing to use pre-seeded data and/or chunk a large JSON file

Open Explorergt92 opened this issue 2 years ago • 6 comments

⚠️ Search for existing issues first ⚠️

  • [X] I have searched the existing issues, and there is no existing issue for my problem

GPT-3 or GPT-4

  • [x] I am using Auto-GPT with GPT-3 (GPT-3.5)
  • [x] I am using Auto-GPT with GPT-4
  • Using GTP-4 as SMART
  • Using GPT-3 as FAST

Steps to reproduce 🕹

Started a fresh redis server in docker and pre-seeded the issues_data.json file to redis with:

python data_ingestion.py --f issues_data.json --init --overlap 300 --max_length 3000

I verified the size of the memory file `dump.rdb' in the redis container grew ... and it went up from nearly nothing to 64MB.

'python -m autogpt'

Used a design.txt file to tell it how it's supposed to work that contains:

Design Document for GitHubIssuesFAQ-Ai

Table of Contents

  1. Introduction
  2. System Overview
  3. Functional Requirements
  4. Non-Functional Requirements
  5. System Architecture
  6. Technologies
  7. Testing
  8. Conclusion

Introduction

GitHubIssuesFAQ-Ai is an AI designed to autonomously manage GitHub issues to make it easier for users to find solutions to their issues. The AI will read design specifications, follow advice, analyze frequently asked questions, and generate a FAQ based on the most common questions and their answers.

System Overview

The GitHubIssuesFAQ-Ai will perform the following tasks:

  1. Read design.txt and follow its design specifications.
  2. Read advice.txt and obey it every 10 minutes.
  3. Use the information saved in its memory to determine the most frequently asked questions from the repo's issues posts.
  4. Determine the best answer to the most frequently asked questions from the issues comments.
  5. Write a FAQ and answer the most frequently asked questions.

Functional Requirements

  1. Read design specifications: The AI will read design.txt and follow the design specifications provided in the file.
  2. Follow advice: The AI will read advice.txt and obey the advice provided in the file every 10 minutes.
  3. Analyze frequently asked questions: The AI will analyze the repo's issues posts and determine the most frequently asked questions.
  4. Determine the best answer: The AI will analyze the issues comments and determine the best answer to the most frequently asked questions.
  5. Generate FAQ: The AI will write a FAQ document that answers the most frequently asked questions.

Non-Functional Requirements

  1. Performance: The AI should be able to handle a large number of issues and comments without significant performance degradation.
  2. Scalability: The AI should be able to scale to handle an increasing number of issues and comments.
  3. Accuracy: The AI should accurately identify the most frequently asked questions and their best answers.
  4. Usability: The generated FAQ should be easy to read and understand by users.

System Architecture

The GitHubIssuesFAQ-Ai system architecture consists of the following components:

  1. Data Ingestion: This component is responsible for reading the design.txt and advice.txt files and ingesting the repo's issues and comments data.
  2. Data Processing: This component is responsible for processing the ingested data and determining the most frequently asked questions and their best answers.
  3. FAQ Generation: This component is responsible for generating the FAQ document based on the most frequently asked questions and their best answers.
  4. Output: This component is responsible for outputting the generated FAQ document.

Technologies

The following technologies will be used for the development of GitHubIssuesFAQ-Ai:

  1. Python: The AI will be developed using Python programming language.
  2. Natural Language Processing (NLP) libraries: Libraries such as NLTK, spaCy, and Gensim will be used for text processing and analysis.
  3. GitHub API: The GitHub API will be used to access the repo's issues and comments data.

Testing

The GitHubIssuesFAQ-Ai will be tested using the following methods:

  1. Unit testing: Unit tests will be written for each component to ensure that they are functioning correctly.
  2. Integration testing: Integration tests will be written to ensure that the components are working together correctly.
  3. System testing: The entire system will be tested to ensure that it meets the functional and non-functional requirements.
  4. User Acceptance testing: The generated FAQ document will be reviewed by users to ensure that it is easy to read and understand.

Conclusion

The GitHubIssuesFAQ-Ai is an AI designed to autonomously analyze issues posted in the github repository and generate a FAQ document based on the most frequently asked questions and their best answers. The AI will read design specifications, follow advice, analyze frequently asked questions, and generate a FAQ. The system architecture consists of data ingestion, data processing, FAQ generation, and output components. The AI will be developed using Python, NLP libraries, and the GitHub API. The system will be tested using unit, integration, system, and user acceptance testing methods.

Current behavior 😯

Auto-GPT failed to use pre-seeded data at all and went ahead and downloaded the main repo to /Auto-GPT/auto_gpt_workspace folder to gather data.

After trying to prompt it to use the redis memory and failing, I instructed it to 'read issues_data.json' and after 5 minutes of the data it's reading scrolling up the screen I get an error that reads:

openai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 7181084 tokens (7181084 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.

Then Auto-GPT crashes back to the PS command prompt.

Expected behavior 🤔

Expected it to use data pre-seeded into redis memory

Hoped it would be able to use the data from the issues_data.json file directly instead as a work around.

Your prompt 📝

ai_goals:
- Read design.txt and follow its design specifications.
- Read advice.txt and obey it every 10 minutes.
- Use the information saved in your memory to determine the most frequently asked questions from the repos issues posts.
- Determine the best answer to the most frequently asked questions from the issues comments.
- Write a FAQ and answer the most frequently asked questions.
ai_name: GitHubIssuesFAQ-Ai
ai_role: an AI designed to autonomously manage GitHub issues to make it easier for users to find solutions to their issues.

advice.txt contains

  1. Use the data saved in your memory as it already has all the JSON data from the repos you are watching.

Explorergt92 avatar Apr 17 '23 03:04 Explorergt92

Here is the script I used to generate the JSON file:

import requests
import json

def get_all_issues(repo_url, access_token):
    api_url = repo_url.replace("https://github.com", "https://api.github.com/repos")
    api_url += "/issues"
    headers = {
        "Accept": "application/vnd.github+json",
        "Authorization": f"token {access_token}"
    }
    params = {
        "state": "open",
        "per_page": 100,
        "page": 1
    }
    all_issues = []

    while True:
        response = requests.get(api_url, headers=headers, params=params)
        if response.status_code != 200:
            print(f"Error: {response.status_code}")
            break

        issues = response.json()
        if not issues:
            break

        all_issues.extend(issues)
        params["page"] += 1

    return all_issues

def get_comments_for_issue(issue_url, access_token):
    headers = {
        "Accept": "application/vnd.github+json",
        "Authorization": f"token {access_token}"
    }
    params = {
        "per_page": 100,
        "page": 1
    }
    all_comments = []

    while True:
        response = requests.get(issue_url, headers=headers, params=params)
        if response.status_code != 200:
            print(f"Error: {response.status_code}")
            break

        comments = response.json()
        if not comments:
            break

        all_comments.extend(comments)
        params["page"] += 1

    return all_comments

def save_issues_to_file(issues, output_file):
    with open(output_file, "w") as f:
        json.dump(issues, f, indent=4)

def main():
    repo_url = "https://github.com/Significant-Gravitas/Auto-GPT"
    access_token = "your_github_pat_token_here"
    output_file = "issues_data.json"

    issues = get_all_issues(repo_url, access_token)

    # Get comments for each issue
    for issue in issues:
        comments = get_comments_for_issue(issue["comments_url"], access_token)
        issue["comments"] = comments

    save_issues_to_file(issues, output_file)
    print(f"Saved {len(issues)} issues with comments to {output_file}")

if __name__ == "__main__":
    main()

Explorergt92 avatar Apr 17 '23 03:04 Explorergt92

read issues_data.json' and after 5 minutes of the data it's reading scrolling up the screen I get an error that reads: openai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 7181084 tokens (7181084 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.

that's because your autogpt sent the whole json to openai.

Currently, there's no real command for autogpt to access it's memory, only adding it, though I have had it execute shellcode to access memory. I have not read all of the code thoroughly, but I don't think there is logic for reflecting on stored memory yet. I could have overlooked it.

I played with adding a 'memory_get' command for getting redis memories today if you want to try. I haven't really tested this yet, but i can say that autogpt will use the memory_get and memory_add commands. With proper prompting it might be possible to make use of memory to some degree, but no promises. https://github.com/Slowly-Grokking/Auto-GPT/tree/redis_memory_retrieval_TESTING

I think that we will need to program some basic logical steps for AutoGPT to use as building blocks like Read > Summary > Save, and Check Task list > do task > cross task off, repeat... type stuff in order for functionality to really take off.

Slowly-Grokking avatar Apr 19 '23 04:04 Slowly-Grokking

~~Seems to be covered by #2801~~

Pwuts avatar Apr 22 '23 14:04 Pwuts

Seems to be covered by #2801

This issue is kind of described poorly, but the main issue OP is reporting is not the chunking issue. It's the failure to use memories, which are different than stored files. This has not been covered, unless someone has added commands to make use of memory.get(). I haven't had a chance to go through all the reorg commits this week, but it doesn't appear that we've gained this functionality.

Slowly-Grokking avatar Apr 22 '23 18:04 Slowly-Grokking

Thanks for correcting me @Slowly-Grokking

Pwuts avatar Apr 22 '23 20:04 Pwuts

Thanks for correcting me @Slowly-Grokking

Oh, no worries. I really just wanted to make note of this if even for myself to see later.

Slowly-Grokking avatar Apr 24 '23 15:04 Slowly-Grokking

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

github-actions[bot] avatar Sep 06 '23 21:09 github-actions[bot]

This issue was closed automatically because it has been stale for 10 days with no activity.

github-actions[bot] avatar Sep 17 '23 01:09 github-actions[bot]