Add option to run patch evaluation on Modal
- [ ] This change is worth documenting at https://docs.all-hands.dev/
- [ ] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below
End-user friendly description of the problem this fixes or functionality this introduces.
Summarize what the PR does, explaining any non-trivial design decisions.
This PR is to add option to run patch evaluation on Modal with the official support from SWE-bench.
Link of any specific issues this addresses:
Thank you for this! I'm going to try it ASAP.
@enyst I'm seeing some issues with this actually. When playing around with the official SWE-bench repo for a few times, I saw the result of running on Modal is still not that reliable compared to running locally :( For instance when I run with gold patches, the local eval gave 2/2 whereas Modal eval gave 0/2, and when I tested with some OH patches the local eval also gave 2/2 and Modal eval only gave 1/2. I see this issue from the repo which also kinda mentions it: https://github.com/SWE-bench/SWE-bench/issues/377. Please give it a try and LMK how it goes for you.
I digged into this a bit and this PR seems to fix it, let's wait for the SWE-bench team: https://github.com/SWE-bench/SWE-bench/pull/402