AutoGPT
AutoGPT copied to clipboard
Failing to use pre-seeded data and/or chunk a large JSON file
⚠️ Search for existing issues first ⚠️
- [X] I have searched the existing issues, and there is no existing issue for my problem
GPT-3 or GPT-4
- [x] I am using Auto-GPT with GPT-3 (GPT-3.5)
- [x] I am using Auto-GPT with GPT-4
- Using GTP-4 as SMART
- Using GPT-3 as FAST
Steps to reproduce 🕹
Started a fresh redis server in docker and pre-seeded the issues_data.json file to redis with:
python data_ingestion.py --f issues_data.json --init --overlap 300 --max_length 3000
I verified the size of the memory file `dump.rdb' in the redis container grew ... and it went up from nearly nothing to 64MB.
'python -m autogpt'
Used a design.txt file to tell it how it's supposed to work that contains:
Design Document for GitHubIssuesFAQ-Ai
Table of Contents
- Introduction
- System Overview
- Functional Requirements
- Non-Functional Requirements
- System Architecture
- Technologies
- Testing
- Conclusion
Introduction
GitHubIssuesFAQ-Ai is an AI designed to autonomously manage GitHub issues to make it easier for users to find solutions to their issues. The AI will read design specifications, follow advice, analyze frequently asked questions, and generate a FAQ based on the most common questions and their answers.
System Overview
The GitHubIssuesFAQ-Ai will perform the following tasks:
- Read
design.txtand follow its design specifications. - Read
advice.txtand obey it every 10 minutes. - Use the information saved in its memory to determine the most frequently asked questions from the repo's issues posts.
- Determine the best answer to the most frequently asked questions from the issues comments.
- Write a FAQ and answer the most frequently asked questions.
Functional Requirements
- Read design specifications: The AI will read
design.txtand follow the design specifications provided in the file. - Follow advice: The AI will read
advice.txtand obey the advice provided in the file every 10 minutes. - Analyze frequently asked questions: The AI will analyze the repo's issues posts and determine the most frequently asked questions.
- Determine the best answer: The AI will analyze the issues comments and determine the best answer to the most frequently asked questions.
- Generate FAQ: The AI will write a FAQ document that answers the most frequently asked questions.
Non-Functional Requirements
- Performance: The AI should be able to handle a large number of issues and comments without significant performance degradation.
- Scalability: The AI should be able to scale to handle an increasing number of issues and comments.
- Accuracy: The AI should accurately identify the most frequently asked questions and their best answers.
- Usability: The generated FAQ should be easy to read and understand by users.
System Architecture
The GitHubIssuesFAQ-Ai system architecture consists of the following components:
- Data Ingestion: This component is responsible for reading the
design.txtandadvice.txtfiles and ingesting the repo's issues and comments data. - Data Processing: This component is responsible for processing the ingested data and determining the most frequently asked questions and their best answers.
- FAQ Generation: This component is responsible for generating the FAQ document based on the most frequently asked questions and their best answers.
- Output: This component is responsible for outputting the generated FAQ document.
Technologies
The following technologies will be used for the development of GitHubIssuesFAQ-Ai:
- Python: The AI will be developed using Python programming language.
- Natural Language Processing (NLP) libraries: Libraries such as NLTK, spaCy, and Gensim will be used for text processing and analysis.
- GitHub API: The GitHub API will be used to access the repo's issues and comments data.
Testing
The GitHubIssuesFAQ-Ai will be tested using the following methods:
- Unit testing: Unit tests will be written for each component to ensure that they are functioning correctly.
- Integration testing: Integration tests will be written to ensure that the components are working together correctly.
- System testing: The entire system will be tested to ensure that it meets the functional and non-functional requirements.
- User Acceptance testing: The generated FAQ document will be reviewed by users to ensure that it is easy to read and understand.
Conclusion
The GitHubIssuesFAQ-Ai is an AI designed to autonomously analyze issues posted in the github repository and generate a FAQ document based on the most frequently asked questions and their best answers. The AI will read design specifications, follow advice, analyze frequently asked questions, and generate a FAQ. The system architecture consists of data ingestion, data processing, FAQ generation, and output components. The AI will be developed using Python, NLP libraries, and the GitHub API. The system will be tested using unit, integration, system, and user acceptance testing methods.
Current behavior 😯
Auto-GPT failed to use pre-seeded data at all and went ahead and downloaded the main repo to /Auto-GPT/auto_gpt_workspace folder to gather data.
After trying to prompt it to use the redis memory and failing, I instructed it to 'read issues_data.json' and after 5 minutes of the data it's reading scrolling up the screen I get an error that reads:
openai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 7181084 tokens (7181084 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.
Then Auto-GPT crashes back to the PS command prompt.
Expected behavior 🤔
Expected it to use data pre-seeded into redis memory
Hoped it would be able to use the data from the issues_data.json file directly instead as a work around.
Your prompt 📝
ai_goals:
- Read design.txt and follow its design specifications.
- Read advice.txt and obey it every 10 minutes.
- Use the information saved in your memory to determine the most frequently asked questions from the repos issues posts.
- Determine the best answer to the most frequently asked questions from the issues comments.
- Write a FAQ and answer the most frequently asked questions.
ai_name: GitHubIssuesFAQ-Ai
ai_role: an AI designed to autonomously manage GitHub issues to make it easier for users to find solutions to their issues.
advice.txt contains
- Use the data saved in your memory as it already has all the JSON data from the repos you are watching.
Here is the script I used to generate the JSON file:
import requests
import json
def get_all_issues(repo_url, access_token):
api_url = repo_url.replace("https://github.com", "https://api.github.com/repos")
api_url += "/issues"
headers = {
"Accept": "application/vnd.github+json",
"Authorization": f"token {access_token}"
}
params = {
"state": "open",
"per_page": 100,
"page": 1
}
all_issues = []
while True:
response = requests.get(api_url, headers=headers, params=params)
if response.status_code != 200:
print(f"Error: {response.status_code}")
break
issues = response.json()
if not issues:
break
all_issues.extend(issues)
params["page"] += 1
return all_issues
def get_comments_for_issue(issue_url, access_token):
headers = {
"Accept": "application/vnd.github+json",
"Authorization": f"token {access_token}"
}
params = {
"per_page": 100,
"page": 1
}
all_comments = []
while True:
response = requests.get(issue_url, headers=headers, params=params)
if response.status_code != 200:
print(f"Error: {response.status_code}")
break
comments = response.json()
if not comments:
break
all_comments.extend(comments)
params["page"] += 1
return all_comments
def save_issues_to_file(issues, output_file):
with open(output_file, "w") as f:
json.dump(issues, f, indent=4)
def main():
repo_url = "https://github.com/Significant-Gravitas/Auto-GPT"
access_token = "your_github_pat_token_here"
output_file = "issues_data.json"
issues = get_all_issues(repo_url, access_token)
# Get comments for each issue
for issue in issues:
comments = get_comments_for_issue(issue["comments_url"], access_token)
issue["comments"] = comments
save_issues_to_file(issues, output_file)
print(f"Saved {len(issues)} issues with comments to {output_file}")
if __name__ == "__main__":
main()
read issues_data.json' and after 5 minutes of the data it's reading scrolling up the screen I get an error that reads: openai.error.InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 7181084 tokens (7181084 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.
that's because your autogpt sent the whole json to openai.
Currently, there's no real command for autogpt to access it's memory, only adding it, though I have had it execute shellcode to access memory. I have not read all of the code thoroughly, but I don't think there is logic for reflecting on stored memory yet. I could have overlooked it.
I played with adding a 'memory_get' command for getting redis memories today if you want to try. I haven't really tested this yet, but i can say that autogpt will use the memory_get and memory_add commands. With proper prompting it might be possible to make use of memory to some degree, but no promises. https://github.com/Slowly-Grokking/Auto-GPT/tree/redis_memory_retrieval_TESTING
I think that we will need to program some basic logical steps for AutoGPT to use as building blocks like Read > Summary > Save, and Check Task list > do task > cross task off, repeat... type stuff in order for functionality to really take off.
~~Seems to be covered by #2801~~
Seems to be covered by #2801
This issue is kind of described poorly, but the main issue OP is reporting is not the chunking issue. It's the failure to use memories, which are different than stored files. This has not been covered, unless someone has added commands to make use of memory.get(). I haven't had a chance to go through all the reorg commits this week, but it doesn't appear that we've gained this functionality.
Thanks for correcting me @Slowly-Grokking
Thanks for correcting me @Slowly-Grokking
Oh, no worries. I really just wanted to make note of this if even for myself to see later.
This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.
This issue was closed automatically because it has been stale for 10 days with no activity.