OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

[Feature]: Create a better leaderboard for OpenHands

Open openhands-agent opened this issue 1 year ago • 4 comments

What problem or use case are you trying to solve? Currently, there is no comprehensive leaderboard that effectively tracks the performance of open-sourced models within OpenHands. A clear leaderboard would facilitate comparison and improvement.

Describe the UX of the solution you'd like An online sheet that continuously updates and tracks performance metrics for various models, possibly with the ability to click on links for deeper insights specific to each model.

Do you have thoughts on the technical implementation? The leaderboard could utilize existing performance data stored in a central database, automatically populated and refreshed at intervals, ensuring real-time updates.

Describe alternatives you've considered Using a static document or manual tracking methods, but these lack the dynamic features and timeliness that a real leaderboard would provide.

Additional context This initiative was discussed in a recent thread on Slack with users expressing interest in better tracking mechanisms for performance.

Issue Created By: Graham Neubig on Slack

openhands-agent avatar Dec 22 '24 01:12 openhands-agent

Currently this doc is our best leaderboard: https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs/edit?gid=0#gid=0

neubig avatar Dec 31 '24 01:12 neubig

Probably separate feature request, linking in with the feedback method, could users run and submit a benchmark from their setup from the UI (not swe, maybe something simpler or lite)?

edit: disregard, just noticed https://github.com/All-Hands-AI/OpenHands/issues/5924

JohnsterID avatar Dec 31 '24 02:12 JohnsterID

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jan 31 '25 18:01 github-actions[bot]

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Mar 09 '25 01:03 github-actions[bot]

This issue was closed because it has been stalled for over 30 days with no activity.

github-actions[bot] avatar Mar 16 '25 02:03 github-actions[bot]