AutoGPT
AutoGPT copied to clipboard
setup challenges score
Background
New behavior:
- a current_score.json is updated at every CI pipeline run. If the performance of Auto-GPT didn't change, it will not commit anything. If a new challenge has been added, it will be added to the current_score. For now the tests still won't attempt to beat new challenges but only pass previously beaten ones however I am working on that and I need the current to get merged in order to achieve this.
Current behavior:
- when creating a PR we don't have a challenge score where we can see the current performance of Auto-GPT
- we don't really have one central file that describes Auto-GPT's current performance, except the docs but they could be out of date.
Changes
- added a challenge decorator that will build the current_score as the tests run
- the decorator will decide which regression test to run based on the previous current_score.json. This doesn't solve the problem of beating new challenges YET, but it's very easy to add to this decorator, I just want to do it step by step.
Documentation
In terms of communication to the end user, he will receive a link in the PR so that he can merge the current score and the cassettes if they changed.
Test Plan
PR Quality Checklist
- [x] My pull request is atomic and focuses on a single change.
- [x] I have thoroughly tested my changes with multiple different prompts.
- [x] I have considered potential risks and mitigations for my changes.
- [ ] I have documented my changes clearly and comprehensively.
- [x] I have not snuck in any "extra" small tweaks changes
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Ignored Deployment
| Name | Status | Preview | Comments | Updated (UTC) |
|---|---|---|---|---|
| docs | ⬜️ Ignored (Inspect) | May 23, 2023 0:57am |
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
Codecov Report
Patch and project coverage have no change.
Comparison is base (
9c7f567) 64.76% compared to head (c2b819d) 64.76%.
:exclamation: Current head c2b819d differs from pull request most recent head 88fb637. Consider uploading reports for the commit 88fb637 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## master #4259 +/- ##
=======================================
Coverage 64.76% 64.76%
=======================================
Files 75 75
Lines 3542 3542
Branches 520 520
=======================================
Hits 2294 2294
Misses 1078 1078
Partials 170 170
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Is this ready for review?
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.