OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

Add option to run patch evaluation on Modal

Open ryanhoangt opened this issue 7 months ago • 2 comments

  • [ ] This change is worth documenting at https://docs.all-hands.dev/
  • [ ] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

End-user friendly description of the problem this fixes or functionality this introduces.


Summarize what the PR does, explaining any non-trivial design decisions.

This PR is to add option to run patch evaluation on Modal with the official support from SWE-bench.


Link of any specific issues this addresses:

ryanhoangt avatar May 21 '25 07:05 ryanhoangt

Thank you for this! I'm going to try it ASAP.

enyst avatar May 21 '25 12:05 enyst

@enyst I'm seeing some issues with this actually. When playing around with the official SWE-bench repo for a few times, I saw the result of running on Modal is still not that reliable compared to running locally :( For instance when I run with gold patches, the local eval gave 2/2 whereas Modal eval gave 0/2, and when I tested with some OH patches the local eval also gave 2/2 and Modal eval only gave 1/2. I see this issue from the repo which also kinda mentions it: https://github.com/SWE-bench/SWE-bench/issues/377. Please give it a try and LMK how it goes for you.

ryanhoangt avatar May 21 '25 12:05 ryanhoangt

I digged into this a bit and this PR seems to fix it, let's wait for the SWE-bench team: https://github.com/SWE-bench/SWE-bench/pull/402

ryanhoangt avatar May 22 '25 17:05 ryanhoangt