OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

[Mega Issue] Agent Quality: Editing

Open xingyaoww opened this issue 1 year ago • 3 comments

What problem or use case are you trying to solve?

OpenDevin hits a bunch of issues when trying to perform edits:

  • [ ] https://github.com/OpenDevin/OpenDevin/issues/3214
  • [ ] Repeatedly failed to edit: https://opendevin.slack.com/archives/C06R25BT5B2/p1722547184048369

Describe the UX of the solution you'd like

Integrate / Build a light-weight evaluation for editing:

  • [ ] https://github.com/OpenDevin/OpenDevin/issues/3183
  • [ ] https://github.com/OpenDevin/OpenDevin/pull/2685#discussion_r1659651873

(Longer-term) Collect data and train a specialized model for editing

  • [ ] https://github.com/OpenDevin/OpenDevin/issues/1854

Do you have thoughts on the technical implementation?

Describe alternatives you've considered

Additional context

xingyaoww avatar Aug 02 '24 18:08 xingyaoww

Here is an example of a prompt that causes lots of editing issues: https://www.all-hands.dev/share-opendevin?share_id=4e4841bd6240dc2ee334742dee59f5104b5c982ff9309aaeea5ce977968bfcf3

neubig avatar Aug 13 '24 00:08 neubig

Related issue: https://github.com/OpenDevin/OpenDevin/issues/3412

li-boxuan avatar Aug 16 '24 01:08 li-boxuan

Related issue: #3452

tobitege avatar Aug 20 '24 05:08 tobitege

@neubig @tobitege @li-boxuan @xingyaoww

Editing can be implemented as Terminal or GUI agent operations. For example:

  • Instruct a Terminal agent to use an editor like nano or vim
  • Instruct a GUI agent to use an IDE like VSCode or Jetbrains IDEA

Cybergod is implementing these kind of agents.

If you want to find some legacy agentic editors, check code here and docs here.

James4Ever0 avatar Aug 28 '24 09:08 James4Ever0

Interesting @James4Ever0 ! Have you compared the accuracy of something like this compared to other methods like the ones implemented in OpenHands or SWE-Agent?

neubig avatar Sep 04 '24 22:09 neubig

Interesting @James4Ever0 ! Have you compared the accuracy of something like this compared to other methods like the ones implemented in OpenHands or SWE-Agent?

Benchmarking has not yet been done since this is just a library for terminal interaction and the actual terminal agent implementation is the most important factor in performance, which is on my TODO list.

This method is promising since it allows direct interaction with almost all terminal programs, including those rendering complex graphics (like gameplay and QR codes) and layouts. So if there is anyone interested in implementing this terminal agent, I will do my best to answer all questions about my code and related topics.

James4Ever0 avatar Sep 04 '24 23:09 James4Ever0

We've made a lot of good progress on this generally. @xingyaoww or @enyst do we feel like this is resolved?

mamoodi avatar Dec 05 '24 16:12 mamoodi

Let's close it for now since editing has got a lot better since new sonnet

xingyaoww avatar Dec 05 '24 16:12 xingyaoww