autogen
autogen copied to clipboard
AutoGen meets SWE_bench
Why are these changes needed?
This PR provides a basic implementation for SWE_bench benchmark (https://www.swebench.com). This PR is still on working and not ready for review.
- [ ] Specific packages pre-requirement
- [ ] Documentations (notebook, blogpost)
- [ ] Clean the codes.
🌹 Acknowledgement: The code is greatly on the basis of swe-agent.
Please add more @Hk669 if you have other suggestions. @Hk669 will mainly take response for it based on this draft PR.
Related issue number
N/A
Checks
- [ ] I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
- [ ] I've added tests (if relevant) corresponding to the changes introduced in this PR.
- [ ] I've made sure all auto checks have passed.
️✅ There are no secrets present in this pull request anymore.
If these secrets were true positive and are still valid, we highly recommend you to revoke them. Once a secret has been leaked into a git repository, you should consider it compromised, even if it was deleted immediately. Find here more information about risks.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
Codecov Report
Attention: Patch coverage is 0% with 284 lines in your changes missing coverage. Please review.
Please upload report for BASE (
0.2@780523f). Learn more about missing BASE report.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| autogen/agentchat/contrib/swebench_agent.py | 0.00% | 263 Missing :warning: |
| autogen/agentchat/contrib/swebench_utils.py | 0.00% | 21 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## 0.2 #2933 +/- ##
======================================
Coverage ? 19.97%
======================================
Files ? 97
Lines ? 10953
Branches ? 2509
======================================
Hits ? 2188
Misses ? 8614
Partials ? 151
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 19.93% <0.00%> (?) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
- :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
@skzhang1 would you like to continue working on this draft PR?
@ekzhu Hi, I may not working on it recently. This PR provides a basic implementation and it is workable. I hope other people in the community could further work on it based on this PR. @Hk669
Thanks. Do you think we can close this and revisit the idea once the new version is merged? It can be part of agbench
Thanks. Do you think we can close this and revisit the idea once the new version is merged? It can be part of
agbench
Sure @ekzhu , we can revisit after the new version. agbench