AutoGPT setup challenges score

Background

New behavior:

a current_score.json is updated at every CI pipeline run. If the performance of Auto-GPT didn't change, it will not commit anything. If a new challenge has been added, it will be added to the current_score. For now the tests still won't attempt to beat new challenges but only pass previously beaten ones however I am working on that and I need the current to get merged in order to achieve this.

Current behavior:

when creating a PR we don't have a challenge score where we can see the current performance of Auto-GPT
we don't really have one central file that describes Auto-GPT's current performance, except the docs but they could be out of date.

Changes

added a challenge decorator that will build the current_score as the tests run
the decorator will decide which regression test to run based on the previous current_score.json. This doesn't solve the problem of beating new challenges YET, but it's very easy to add to this decorator, I just want to do it step by step.

Documentation

In terms of communication to the end user, he will receive a link in the PR so that he can merge the current score and the cassettes if they changed.

Test Plan

PR Quality Checklist

[x] My pull request is atomic and focuses on a single change.
[x] I have thoroughly tested my changes with multiple different prompts.
[x] I have considered potential risks and mitigations for my changes.
[ ] I have documented my changes clearly and comprehensively.
[x] I have not snuck in any "extra" small tweaks changes

May 17 '23 03:05 waynehamadi

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
docs	⬜️ Ignored (Inspect)			May 23, 2023 0:57am

May 17 '23 03:05 vercel[bot]

This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size

May 17 '23 03:05 github-actions[bot]

This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size

May 17 '23 03:05 github-actions[bot]

Codecov Report

Patch and project coverage have no change.

Comparison is base (9c7f567) 64.76% compared to head (c2b819d) 64.76%.

:exclamation: Current head c2b819d differs from pull request most recent head 88fb637. Consider uploading reports for the commit 88fb637 to get more accurate results

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #4259   +/-   ##
=======================================
  Coverage   64.76%   64.76%           
=======================================
  Files          75       75           
  Lines        3542     3542           
  Branches      520      520           
=======================================
  Hits         2294     2294           
  Misses       1078     1078           
  Partials      170      170

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

May 17 '23 03:05 codecov[bot]

Is this ready for review?

May 21 '23 19:05 ntindle

This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size

May 22 '23 20:05 github-actions[bot]

This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size

May 22 '23 20:05 github-actions[bot]

This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size

May 22 '23 20:05 github-actions[bot]

This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size

May 23 '23 03:05 github-actions[bot]

This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size

May 23 '23 03:05 github-actions[bot]

This PR exceeds the recommended size of 200 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size

May 23 '23 12:05 github-actions[bot]

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

May 23 '23 14:05 github-actions[bot]

AutoGPT AutoGPT copied to clipboard

setup challenges score

Background

Changes

Documentation

Test Plan

PR Quality Checklist

Codecov Report

AutoGPT
AutoGPT copied to clipboard