OpenHands Add Agent Comparable to SWE-agent

These folks seem to be doing really well on the SWE-bench score!

https://swe-agent.com/

Apr 02 '24 11:04 rbren

I think a good first step would be:

[x] SWE-Agent requires a persistent docker execution(1, 2). Our todo for this would be making our docker sandbox keeps all its past states as discussed here - I am working on a version based on dockerpty but it seems to be pretty tricky to get it working (draft PR #597 - suggestions/PRs are welcomed!)
[ ] I think the critical component of SWE-agent's success lies in the specific hand-crafted tools (available as commands in bash) like the search_dir, open, and edit tools. A very easy way for us to incorporate these would be: (1) find these tools, and implement as a bash command (this is exactly how SWE-agent implement it!! - check this example), (2) pack these tools into our sandbox docker image, (3) tweak the prompt (like the one here) to let the agent knows these are the additional tools they can use. I think (1) and (2) are not hard to do - we probably only need to tweak our Dockerfile to achieve this. I think (1) and (2) are good first issues, feel free to start PRs!

[ ] Besides these tools, SWE-Bench have pretty sophisticated bash command parsing and block lists. Maybe we can get rid of it to simply our solution at the early stage by directly allowing the agent to interact with the system via bash (similar to CodeActAgent's pipeline - the agent can execute whatever they want, we returns an error message (like this) when it touches on command on the blocklist)
[ ] Hardest step: Get our SWE-Bench evaluation working and have our agents generate correct patches so that we can test them (we need some logic like this to capture the patch submission and test them properly! @libowen2121 is actively working on the first part (getting evaluation working) -- you can let him know if you are interested in contributing to this part!

Free free to suggest anything I missed!

Apr 02 '24 19:04 xingyaoww

@rbren @xingyaoww Did anyone pick this??

Apr 05 '24 09:04 guneetsk99

Hey @guneetsk99 , not yet! Please feel free to start taking a look at the substeps and claim ones that you like.

Apr 05 '24 15:04 neubig

Correction @guneetsk99, I've started with the handcrafted tools here.

Apr 05 '24 16:04 Sparkier

@Sparkier lets connect on slack Can you DM me. We can collaborate if you ok with it

Apr 05 '24 16:04 guneetsk99

Made a Draft PR for this: #846

It's a WIP, but it has basic functionality. I am going to work on integrating the commands from #682. Thank you @Sparkier for making those.

If anyone wants to help out feel free.

Apr 07 '24 07:04 JayQuimby

SWE-agent is merged!

Unclear how it compares to the main SWE-agent--will see if we can get some folks to eval it

Apr 17 '24 20:04 rbren

Hey @rbren, I was looking back at our roadmap, and I don't think this was ready to be closed yet, because we haven't done all of the steps that @xingyaoww mentioned. We still need to test whether our implemented version of the agent gets competitive accuracy on SWE-bench, which we can do after we finish #795

Apr 29 '24 10:04 neubig

This is finished, great job @xingyaoww and @libowen2121 ! https://xwang.dev/blog/2024/opendevin-codeact-1.0-swebench/

May 09 '24 12:05 neubig

OpenHands OpenHands copied to clipboard

Add Agent Comparable to SWE-agent

OpenHands
OpenHands copied to clipboard