OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

add Visual SWE-bench benchmark

Open luolin101 opened this issue 9 months ago • 4 comments

End-user friendly description of the problem this fixes or functionality that this introduces.


Give a summary of what the PR does, explaining any non-trivial design decisions. Visual SWE-bench focuses on visual issues and features a structure similar to SWE-bench, where each problem statement includes visual data. This PR enables OpenHands to use the evaluation Docker image for inference and evaluation on this benchmark. This new PR is built on #5911.


Link of any specific issues this addresses.

luolin101 avatar Mar 06 '25 16:03 luolin101