SWE-bench
SWE-bench copied to clipboard
What are expected to submit for the leaderboard integration?
Check https://www.swebench.com and found:
We're still reviewing the process for evaluating submissions. For now, we'd prefer results with a public or soon-to-be-public paper or technical report and the generated patches we can use to verify performance. Each submission should include a description of the evaluation setting which will be categorized as assisted or unassisted.
We're still reviewing the process for evaluating submissions. For now, we'd prefer results with a public or soon-to-be-public paper or technical report and the generated patches we can use to verify performance. Each submission should include a description of the evaluation setting which will be categorized as assisted or unassisted.
hello Carlos! @carlosejimenez
I wanted to follow up on the experimental results we submitted via email two days ago. We are keen to understand the review process as it's crucial for our project's next steps. I would greatly appreciate any information you could provide on this matter. Thank you very much for your understanding and support, and I look forward to your prompt response.
Thanks in advance!
@zhimin-z @itaowei just a small update - thanks for your patience, we will be finalizing this and posting about it by the end of this month.
In short, it will require sending us 1. your execution .log
files and 2. the predictions generated by your model. We will use both to verify the reported numbers, and we will also make them accessible via the SWE-bench leaderboard on the website!
@zhimin-z @itaowei just a small update - thanks for your patience, we will be finalizing this and posting about it by the end of this month.
In short, it will require sending us 1. your execution
.log
files and 2. the predictions generated by your model. We will use both to verify the reported numbers, and we will also make them accessible via the SWE-bench leaderboard on the website!
Got it. Thanks!