OpenHands Add option to run patch evaluation on Modal

[ ] This change is worth documenting at https://docs.all-hands.dev/
[ ] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

End-user friendly description of the problem this fixes or functionality this introduces.

Summarize what the PR does, explaining any non-trivial design decisions.

This PR is to add option to run patch evaluation on Modal with the official support from SWE-bench.

Link of any specific issues this addresses:

May 21 '25 07:05 ryanhoangt

Thank you for this! I'm going to try it ASAP.

May 21 '25 12:05 enyst

@enyst I'm seeing some issues with this actually. When playing around with the official SWE-bench repo for a few times, I saw the result of running on Modal is still not that reliable compared to running locally :( For instance when I run with gold patches, the local eval gave 2/2 whereas Modal eval gave 0/2, and when I tested with some OH patches the local eval also gave 2/2 and Modal eval only gave 1/2. I see this issue from the repo which also kinda mentions it: https://github.com/SWE-bench/SWE-bench/issues/377. Please give it a try and LMK how it goes for you.

May 21 '25 12:05 ryanhoangt

I digged into this a bit and this PR seems to fix it, let's wait for the SWE-bench team: https://github.com/SWE-bench/SWE-bench/pull/402

May 22 '25 17:05 ryanhoangt