[Mega Issue] Agent Quality: Editing
What problem or use case are you trying to solve?
OpenDevin hits a bunch of issues when trying to perform edits:
- [ ] https://github.com/OpenDevin/OpenDevin/issues/3214
- [ ] Repeatedly failed to edit: https://opendevin.slack.com/archives/C06R25BT5B2/p1722547184048369
Describe the UX of the solution you'd like
Integrate / Build a light-weight evaluation for editing:
- [ ] https://github.com/OpenDevin/OpenDevin/issues/3183
- [ ] https://github.com/OpenDevin/OpenDevin/pull/2685#discussion_r1659651873
(Longer-term) Collect data and train a specialized model for editing
- [ ] https://github.com/OpenDevin/OpenDevin/issues/1854
Do you have thoughts on the technical implementation?
Describe alternatives you've considered
Additional context
Here is an example of a prompt that causes lots of editing issues: https://www.all-hands.dev/share-opendevin?share_id=4e4841bd6240dc2ee334742dee59f5104b5c982ff9309aaeea5ce977968bfcf3
Related issue: https://github.com/OpenDevin/OpenDevin/issues/3412
Related issue: #3452
@neubig @tobitege @li-boxuan @xingyaoww
Editing can be implemented as Terminal or GUI agent operations. For example:
- Instruct a Terminal agent to use an editor like
nanoorvim - Instruct a GUI agent to use an IDE like VSCode or Jetbrains IDEA
Cybergod is implementing these kind of agents.
If you want to find some legacy agentic editors, check code here and docs here.
Interesting @James4Ever0 ! Have you compared the accuracy of something like this compared to other methods like the ones implemented in OpenHands or SWE-Agent?
Interesting @James4Ever0 ! Have you compared the accuracy of something like this compared to other methods like the ones implemented in OpenHands or SWE-Agent?
Benchmarking has not yet been done since this is just a library for terminal interaction and the actual terminal agent implementation is the most important factor in performance, which is on my TODO list.
This method is promising since it allows direct interaction with almost all terminal programs, including those rendering complex graphics (like gameplay and QR codes) and layouts. So if there is anyone interested in implementing this terminal agent, I will do my best to answer all questions about my code and related topics.
We've made a lot of good progress on this generally. @xingyaoww or @enyst do we feel like this is resolved?
Let's close it for now since editing has got a lot better since new sonnet