OpenHands feat: add commands for swebench

Apr 03 '24 23:04 Sparkier

Should this code be in the ./evaluation dir? Why is it in codeact?

Apr 04 '24 03:04 rbren

Yeah I was absolutely not sure where to put things and whether these commands should be just for the eval or something the agent should generally be augmented with.

Apr 04 '24 03:04 Sparkier

This is looking good to me! Maybe we should ask the eval folks?

Apr 05 '24 03:04 rbren

I actually don't think evaluation is dependent on the agent / opendevin design - they only need one JSONL file that contains one patch output for each evaluation instances, so i think our current implementation should be fine?

Apr 05 '24 04:04 xingyaoww

Are these functions meant for swe BENCH, as the title indicates? or for swe AGENT as described in this issue: https://github.com/OpenDevin/OpenDevin/issues/570

Apr 05 '24 12:04 foragerr

Are these functions meant for swe BENCH, as the title indicates? or for swe AGENT as described in this issue: #570

Well, these functions are akin to what the folks from SWE Agent did but are meant to improve performance on SWE Bench. So pick whichever you like. It also seems like we're thinking of applying this stuff more broadly even beyond both.

Apr 05 '24 16:04 Sparkier