OpenHands [Eval] Integrate MLE Bench Into Eval Harness

What problem or use case are you trying to solve?

OpenAI released MLE Bench: https://arxiv.org/pdf/2410.07095, which evaluated an earlier version of OpenHands. We should try to integrate the benchmark to our eval harness: https://github.com/All-Hands-AI/OpenHands/tree/main/evaluation

The code that was used to evaluated OpenHands on MLE Bench is available here: https://github.com/openai/mle-bench/tree/main/agents/opendevin

Oct 10 '24 18:10 xingyaoww

Thank you for your excellent work, I would like to know when this MLE-Bench can be supported?

Dec 13 '24 08:12 XingYing-stack

we have a PR open in #5148 and i think @csmith49 is working on it?

Dec 13 '24 14:12 xingyaoww

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Mar 20 '25 02:03 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Apr 20 '25 02:04 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

May 21 '25 02:05 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Jun 21 '25 02:06 github-actions[bot]

@xingyaoww Do you know if this is still a priority? It's currently on our public GitHub roadmap. If it's no longer a priority or being worked on, then I will move it off roadmap.

Jun 26 '25 19:06 jpelletier1

@jpelletier1 i think it is no long a priority any more -- feel free to move it off the road map

Jun 26 '25 22:06 xingyaoww

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Jul 28 '25 02:07 github-actions[bot]

This issue was closed because it has been stalled for over 30 days with no activity.

Aug 05 '25 02:08 github-actions[bot]