OpenHands
OpenHands copied to clipboard
Add Agent Comparable to SWE-agent
These folks seem to be doing really well on the SWE-bench score!
https://swe-agent.com/
I think a good first step would be:
-
[x] SWE-Agent requires a persistent docker execution(1, 2). Our todo for this would be making our docker sandbox keeps all its past states as discussed here - I am working on a version based on
dockerpty
but it seems to be pretty tricky to get it working (draft PR #597 - suggestions/PRs are welcomed!) -
[ ] I think the critical component of SWE-agent's success lies in the specific hand-crafted tools (available as commands in bash) like the
search_dir
,open
, andedit
tools. A very easy way for us to incorporate these would be: (1) find these tools, and implement as a bash command (this is exactly how SWE-agent implement it!! - check this example), (2) pack these tools into our sandbox docker image, (3) tweak the prompt (like the one here) to let the agent knows these are the additional tools they can use. I think (1) and (2) are not hard to do - we probably only need to tweak our Dockerfile to achieve this. I think (1) and (2) are good first issues, feel free to start PRs!
-
[ ] Besides these tools, SWE-Bench have pretty sophisticated bash command parsing and block lists. Maybe we can get rid of it to simply our solution at the early stage by directly allowing the agent to interact with the system via bash (similar to CodeActAgent's pipeline - the agent can execute whatever they want, we returns an error message (like this) when it touches on command on the blocklist)
-
[ ] Hardest step: Get our SWE-Bench evaluation working and have our agents generate correct patches so that we can test them (we need some logic like this to capture the patch submission and test them properly! @libowen2121 is actively working on the first part (getting evaluation working) -- you can let him know if you are interested in contributing to this part!
Free free to suggest anything I missed!
@rbren @xingyaoww Did anyone pick this??
Hey @guneetsk99 , not yet! Please feel free to start taking a look at the substeps and claim ones that you like.
Correction @guneetsk99, I've started with the handcrafted tools here.
@Sparkier lets connect on slack Can you DM me. We can collaborate if you ok with it
Made a Draft PR for this: #846
It's a WIP, but it has basic functionality. I am going to work on integrating the commands from #682. Thank you @Sparkier for making those.
If anyone wants to help out feel free.
SWE-agent is merged!
Unclear how it compares to the main SWE-agent--will see if we can get some folks to eval it
Hey @rbren, I was looking back at our roadmap, and I don't think this was ready to be closed yet, because we haven't done all of the steps that @xingyaoww mentioned. We still need to test whether our implemented version of the agent gets competitive accuracy on SWE-bench, which we can do after we finish #795
This is finished, great job @xingyaoww and @libowen2121 ! https://xwang.dev/blog/2024/opendevin-codeact-1.0-swebench/