SWE-bench icon indicating copy to clipboard operation
SWE-bench copied to clipboard

What are expected to submit for the leaderboard integration?

Open zhimin-z opened this issue 1 year ago • 2 comments

Check https://www.swebench.com and found: image

zhimin-z avatar Mar 04 '24 00:03 zhimin-z

We're still reviewing the process for evaluating submissions. For now, we'd prefer results with a public or soon-to-be-public paper or technical report and the generated patches we can use to verify performance. Each submission should include a description of the evaluation setting which will be categorized as assisted or unassisted.

carlosejimenez avatar Mar 05 '24 16:03 carlosejimenez

We're still reviewing the process for evaluating submissions. For now, we'd prefer results with a public or soon-to-be-public paper or technical report and the generated patches we can use to verify performance. Each submission should include a description of the evaluation setting which will be categorized as assisted or unassisted.

hello Carlos! @carlosejimenez

I wanted to follow up on the experimental results we submitted via email two days ago. We are keen to understand the review process as it's crucial for our project's next steps. I would greatly appreciate any information you could provide on this matter. Thank you very much for your understanding and support, and I look forward to your prompt response.

Thanks in advance!

itaowei avatar Mar 08 '24 03:03 itaowei

@zhimin-z @itaowei just a small update - thanks for your patience, we will be finalizing this and posting about it by the end of this month.

In short, it will require sending us 1. your execution .log files and 2. the predictions generated by your model. We will use both to verify the reported numbers, and we will also make them accessible via the SWE-bench leaderboard on the website!

john-b-yang avatar Apr 16 '24 14:04 john-b-yang

@zhimin-z @itaowei just a small update - thanks for your patience, we will be finalizing this and posting about it by the end of this month.

In short, it will require sending us 1. your execution .log files and 2. the predictions generated by your model. We will use both to verify the reported numbers, and we will also make them accessible via the SWE-bench leaderboard on the website!

Got it. Thanks!

zhimin-z avatar Apr 16 '24 14:04 zhimin-z