OpenHands How to reproduce OpenHands' performance on SWE-Bench-Verified.

How to reproduce OpenHands' performance on SWE-Bench-Verified.

Open BoxiYu opened this issue 10 months ago • 1 comments

Hi, I am trying to reproduce OpenHands' score on SWE-Bench-Verified. Could you please provide some instructions for reproduction. Many thanks.

Jan 15 '25 14:01 BoxiYu

If you're trying to run the tests yourself, take a look at the instructions in the README here. You'll have to manually replace all references to princeton-nlp/SWE-bench_Lite with princeton-nlp/SWE-bench_Verified.

Jan 15 '25 16:01 csmith49

Thank you so much @csmith49 . I would try it.

Jan 17 '25 08:01 BoxiYu

Could you please provide the hyper-parameters, such as config.toml for reproducing the score of openhand-codeact-2.1 (claude-sonnet) on swe-bench leaderboard? @csmith49

I use the default setting with claude, and I only get the output like:

ERROR:root:<class 'RuntimeError'>: Maximum error retries reached for instance astropy__astropy-12907
Instances processed:   0%|                                                                                                                                  | 0/300 [00:25<?, ?it/s]

Jan 22 '25 13:01 BoxiYu

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Feb 22 '25 01:02 github-actions[bot]

This issue was closed because it has been stalled for over 30 days with no activity.

Mar 01 '25 02:03 github-actions[bot]

OpenHands OpenHands copied to clipboard

How to reproduce OpenHands' performance on SWE-Bench-Verified.

OpenHands
OpenHands copied to clipboard