[Eval] Integrate MLE Bench Into Eval Harness
What problem or use case are you trying to solve?
OpenAI released MLE Bench: https://arxiv.org/pdf/2410.07095, which evaluated an earlier version of OpenHands. We should try to integrate the benchmark to our eval harness: https://github.com/All-Hands-AI/OpenHands/tree/main/evaluation
The code that was used to evaluated OpenHands on MLE Bench is available here: https://github.com/openai/mle-bench/tree/main/agents/opendevin
Thank you for your excellent work, I would like to know when this MLE-Bench can be supported?
we have a PR open in #5148 and i think @csmith49 is working on it?
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
@xingyaoww Do you know if this is still a priority? It's currently on our public GitHub roadmap. If it's no longer a priority or being worked on, then I will move it off roadmap.
@jpelletier1 i think it is no long a priority any more -- feel free to move it off the road map
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for over 30 days with no activity.