feat: add commands for swebench
Should this code be in the ./evaluation dir? Why is it in codeact?
Yeah I was absolutely not sure where to put things and whether these commands should be just for the eval or something the agent should generally be augmented with.
This is looking good to me! Maybe we should ask the eval folks?
I actually don't think evaluation is dependent on the agent / opendevin design - they only need one JSONL file that contains one patch output for each evaluation instances, so i think our current implementation should be fine?
Are these functions meant for swe BENCH, as the title indicates? or for swe AGENT as described in this issue: https://github.com/OpenDevin/OpenDevin/issues/570
Are these functions meant for swe BENCH, as the title indicates? or for swe AGENT as described in this issue: #570
Well, these functions are akin to what the folks from SWE Agent did but are meant to improve performance on SWE Bench. So pick whichever you like. It also seems like we're thinking of applying this stuff more broadly even beyond both.