AutoGPT
AutoGPT copied to clipboard
New Challenges: Information Retrieval for SpaceX, Anthropic, AutoGPT, Milvius
Background
This is an information retrieval challenge to get AutoGPT to find new information that is not known already by the LLM
Changes
only the files for the challenge
Documentation
Test Plan
pytest tests/integration/challenges/information_retrieval/test_information_retrieval_challenge_b.py -v
PR Quality Checklist
- [x] My pull request is atomic and focuses on a single change.
- [ ] I have thoroughly tested my changes with multiple different prompts.
- [x] I have considered potential risks and mitigations for my changes.
- [x] I have documented my changes clearly and comprehensively.
- [x] I have not snuck in any "extra" small tweaks changes
Deployment failed with the following error:
Resource is limited - try again in 43 minutes (more than 100, code: "api-deployments-free-per-day").
Related to #4244
Codecov Report
Patch coverage has no change and project coverage change: +0.74 :tada:
Comparison is base (
0839a16) 62.18% compared to head (af17010) 62.92%.
:exclamation: Current head af17010 differs from pull request most recent head c96a234. Consider uploading reports for the commit c96a234 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## master #4245 +/- ##
==========================================
+ Coverage 62.18% 62.92% +0.74%
==========================================
Files 73 73
Lines 3345 3345
Branches 484 484
==========================================
+ Hits 2080 2105 +25
+ Misses 1118 1093 -25
Partials 147 147
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Ignored Deployment
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| docs | ⬜️ Ignored (Inspect) | Visit Preview | May 17, 2023 11:18pm |
This challenge passes locally
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This challenge passes locally
an autonomous agent that specializes in researching and saving files as output.txt
is all in agent_factory and I am not fully sure but I have intuition it has something with recent changes in memory handlers, as read_file vs write_file implementation differs slightly and it looks like it wants to save to output.txt but read from arxiv_paper.txt
2023-05-16T20:28:22.0290790Z
Save the name of the main author to output.txt
\"name\": \"read_file\",\n \"args\": {\n \"filename\": \"arxiv_paper.txt\"\n
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
Updated challenges and marked to skip the ones most difficult ones that arent passing consistently
updated to include only the two tests that pass more consistenty
updated to include only the two tests that pass more consistenty
at this point my thoughts are only, wtf is with the test infrastructure
updated to include only the two tests that pass more consistenty
at this point my thoughts are only, wtf is with the test infrastructure
sure! what questions do you have more specifically ?
updated to include only the two tests that pass more consistenty
at this point my thoughts are only, wtf is with the test infrastructure
sure! what questions do you have more specifically ?
Hey, I'm glad you asked. First, I see that you're struggling with a problem related to files, which was the main focus of my previous question. I believe it might be tied to the functioning of the test infrastructure, but I lack complete insights here.
However, since we're already in a discussion, I'd like to present my second point as both an opinion and a question. Why do integration tests validate that AutoGPT works when it's written in such a static way, still relying on bypasses and cycles? From what I can see in the code, you're manipulating the number of these cycles in hopes of merely passing the test for its own sake. But doesn't this approach fall short when tested on a larger scale? It seems you'd need to infinitely increase these cycles when creating more tests. Isn't this counterproductive?
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.
updated to include only the two tests that pass more consistenty
at this point my thoughts are only, wtf is with the test infrastructure
sure! what questions do you have more specifically ?
Hey, I'm glad you asked. First, I see that you're struggling with a problem related to files, which was the main focus of my previous question. I believe it might be tied to the functioning of the test infrastructure, but I lack complete insights here.
However, since we're already in a discussion, I'd like to present my second point as both an opinion and a question. Why do integration tests validate that AutoGPT works when it's written in such a static way, still relying on bypasses and cycles? From what I can see in the code, you're manipulating the number of these cycles in hopes of merely passing the test for its own sake. But doesn't this approach fall short when tested on a larger scale? It seems you'd need to infinitely increase these cycles when creating more tests. Isn't this counterproductive?
@piotrmasior join us on discord so we talk about it https://discord.gg/autogpt
still relying on bypasses and cycles
yeah I don't like this bypass, we will change it to a budget OR define the number of cycles when initializing the agent, to really make it a PURE end to end test without mocks in the middle that rely on the underlying implementation.
infinitely increase these cycles => right now each test can be done in less than 5 cycles on average, so for now we don't have this problem, but soon we will