OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

Add Agent Comparable to SWE-agent

Open rbren opened this issue 10 months ago • 8 comments

These folks seem to be doing really well on the SWE-bench score!

https://swe-agent.com/

rbren avatar Apr 02 '24 11:04 rbren

I think a good first step would be:

  • [x] SWE-Agent requires a persistent docker execution(1, 2). Our todo for this would be making our docker sandbox keeps all its past states as discussed here - I am working on a version based on dockerpty but it seems to be pretty tricky to get it working (draft PR #597 - suggestions/PRs are welcomed!)

  • [ ] I think the critical component of SWE-agent's success lies in the specific hand-crafted tools (available as commands in bash) like the search_dir, open, and edit tools. A very easy way for us to incorporate these would be: (1) find these tools, and implement as a bash command (this is exactly how SWE-agent implement it!! - check this example), (2) pack these tools into our sandbox docker image, (3) tweak the prompt (like the one here) to let the agent knows these are the additional tools they can use. I think (1) and (2) are not hard to do - we probably only need to tweak our Dockerfile to achieve this. I think (1) and (2) are good first issues, feel free to start PRs!

image

  • [ ] Besides these tools, SWE-Bench have pretty sophisticated bash command parsing and block lists. Maybe we can get rid of it to simply our solution at the early stage by directly allowing the agent to interact with the system via bash (similar to CodeActAgent's pipeline - the agent can execute whatever they want, we returns an error message (like this) when it touches on command on the blocklist)

  • [ ] Hardest step: Get our SWE-Bench evaluation working and have our agents generate correct patches so that we can test them (we need some logic like this to capture the patch submission and test them properly! @libowen2121 is actively working on the first part (getting evaluation working) -- you can let him know if you are interested in contributing to this part!

Free free to suggest anything I missed!

xingyaoww avatar Apr 02 '24 19:04 xingyaoww

@rbren @xingyaoww Did anyone pick this??

guneetsk99 avatar Apr 05 '24 09:04 guneetsk99

Hey @guneetsk99 , not yet! Please feel free to start taking a look at the substeps and claim ones that you like.

neubig avatar Apr 05 '24 15:04 neubig

Correction @guneetsk99, I've started with the handcrafted tools here.

Sparkier avatar Apr 05 '24 16:04 Sparkier

@Sparkier lets connect on slack Can you DM me. We can collaborate if you ok with it

guneetsk99 avatar Apr 05 '24 16:04 guneetsk99

Made a Draft PR for this: #846

It's a WIP, but it has basic functionality. I am going to work on integrating the commands from #682. Thank you @Sparkier for making those.

If anyone wants to help out feel free.

JayQuimby avatar Apr 07 '24 07:04 JayQuimby

SWE-agent is merged!

Unclear how it compares to the main SWE-agent--will see if we can get some folks to eval it

rbren avatar Apr 17 '24 20:04 rbren

Hey @rbren, I was looking back at our roadmap, and I don't think this was ready to be closed yet, because we haven't done all of the steps that @xingyaoww mentioned. We still need to test whether our implemented version of the agent gets competitive accuracy on SWE-bench, which we can do after we finish #795

neubig avatar Apr 29 '24 10:04 neubig

This is finished, great job @xingyaoww and @libowen2121 ! https://xwang.dev/blog/2024/opendevin-codeact-1.0-swebench/

neubig avatar May 09 '24 12:05 neubig