OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

Create Aider Agent

Open thiswillbeyourgithub opened this issue 1 year ago • 5 comments

Summary There's an open source AI pair programming tool called aider that implements something interesting to you: a bunch of python classes and functions to ask the LLM to output only the diff to apply instead of writing the whole code. This both reduces the chances of errors and greatly reduces the number of tokens to write (importantly: the completion tokens are way more expensive that the prompt tokens)

Motivation Reduce token cost and errors.

Technical Design A report showcasing their suff can be found here. Most of the code is here and the prompts are here. As you can see lots of though when into this because the LLM has otherwise trouble with the number of lines etc.

Alternatives to Consider None that I know of.

Additional context For a personnal project I inquired about using only the functions of aider and you can read the issue here Also, hearing about OpenDevin made me hear about devika too so I'll be posting this exact same issue on their repo too.

thiswillbeyourgithub avatar Mar 24 '24 11:03 thiswillbeyourgithub

This is really interesting--thanks for bringing it up!

The improvement from using diffs is impressive. But I imagine the logic for applying them is...messy. This would be really interesting for an agent to try out

rbren avatar Mar 24 '24 22:03 rbren

I also really like the idea of the user telling the agent which files to focus on!

rbren avatar Mar 24 '24 22:03 rbren

Glad you like it!

messy

To me on the contrary it seems cleaner especially for long files and project.

Also things like symbex for python seem very promising to allow a bird's eye view of a project by seeing only the signature of each functions. Like a human would. I'm sure there exist other general parser for multiple languages.

But the diff thing is a priority in my mind.

thiswillbeyourgithub avatar Mar 25 '24 08:03 thiswillbeyourgithub

As a user of aider chat I love aider chat the most out of all the tools available. One big thing where aider also adds value is to work with existing codebase. can we prioritize this as well into the opendevin project. I guess you already told this. But aider.chat is the tool which can be utilized for existing repository and it understands all the symbols in the repository. it creates a repository map which contains all the symbols which is extremely good on pinpointing the changes.

cloudbow avatar Mar 28 '24 10:03 cloudbow

This blog about how they use tree-sitter to build a graph of the repo/code is also really interesting/useful:

  • https://aider.chat/docs/repomap.html

0xdevalias avatar Apr 05 '24 01:04 0xdevalias

Our new OpenDevin CodeAct agent implements some of the tools from SWE-Agent that make it possible to do many of the things that aider supports. If there is interest in implementing an aider agent we'd be happy to have contributions, but I'm going to close the issue as unplanned for now unless someone is interested in doing this!

neubig avatar May 11 '24 15:05 neubig

watching the llamaindex webinar with @rbren now - i think an aider "microagent" would be insanely powerful.

specifically - i think it could help with some of the context window mgmt challenges. paul has put an insane amount of work into to refining the diff structure within aider. and if aider is just a tool or a micro-agent, then the parent agent can just see if things works and doesn't necessarily need to be bothered with the details of what the aider tool / microagent did.

rawwerks avatar May 24 '24 22:05 rawwerks

  • https://aider.chat/2024/05/22/swe-bench-lite.html
    • Aider scored 26.3% on the SWE Bench Lite benchmark, achieving a state-of-the-art result. The current top leaderboard entry is 20.3% from Amazon Q Developer Agent. The best result reported elsewhere seems to be 25% from OpenDevin.

  • https://www.swebench.com/
  • https://github.com/paul-gauthier/aider-swe-bench
    • Harness used to benchmark aider against SWE Bench benchmarks

    • https://github.com/paul-gauthier/aider-swe-bench#the-aider-agent
      • The "aider agent" The "aider agent" is dead simple. It simply invokes aider on a fresh copy the problem's git repo over and over, iterating through the models it's been told to use. Aider is invoked repeatedly until aider reports that it successfully edited the repo without any outstanding edit, lint or test errors. This is a plausible solution, so the agent is done.

        Aider is configured with a test command to run all the pre-existing tests in the problem's repo. Aider is also configured to proceed with all its suggestioned actions without any user approval.

See also:

  • #742
  • https://github.com/paul-gauthier/aider/issues/534

0xdevalias avatar May 30 '24 02:05 0xdevalias

Yeah aider's benchmark score is insanely high! We should definitely incorporate aider (in some form).

li-boxuan avatar May 30 '24 04:05 li-boxuan

@rbren @deniz-birlikci and I were talking about the logistics of doing this on slack, here are some details:

Regarding benchmark scores, aider is doing a thing where they repeat over and over again, up to 6 times, if aider doesn't come up with a test-runnable/lintable solution. So the scores are actually a bit lower (~20%) if they only try once. But I think aider definitely has some good ideas incorporated so it's worth trying.

From @rbren:

We might want to pull from Aider in a piecemeal way, rather than importing them as a dependency (we actually can't add it rn due to a conflict in playwright versions anyways--I guess they're working on browsing?) Some ideas:

  • Add RepoMap to the State object, maybe using the Aider class. Then other agents can take advantage of it
  • Implement an EditBlockCoder agent, which
    • takes in a task that describes the edits to be made
    • reads the necessary file
    • prompts the LLM in EditBlock format
    • translates the response into bash for SEARCH/REPLACE or similar
  • Pull in the linting functionality

neubig avatar May 30 '24 10:05 neubig

We need this after it passes more than 40 % on SWE Bench

PierrunoYT avatar Jun 30 '24 10:06 PierrunoYT

We need this after it passes more than 40 % on SWE Bench

@PierrunoYT What is significant about 40%?

0xdevalias avatar Jul 01 '24 02:07 0xdevalias

We need this after it passes more than 40 % on SWE Bench

Its name is Aide, seems not the same as Aider? @PierrunoYT image

assertion avatar Jul 01 '24 02:07 assertion

Its name is Aide, seems not the same as Aider?

Definitely seems to be different to aider.

Context:

  • https://github.com/codestoryai/swe_bench_traces
    • At CodeStory, we are building Aide, a new age editor made for working along with agents. Unlike AI engineers which throw users out of the loop and chat/copilots which are very much triggerd by humans, we envison an editor where agents and developers come together to hack and collaborate.

      At the time of this commit, the agentic framework powering Aide scores 40.3% setting a new benchmark on SWE-Bench-Lite

    • https://github.com/codestoryai/swe_bench_traces/issues/1
  • https://codestory.ai/
    • We believe, we now have the opportunity and necessity, to fundamentally re-imagine the editor to be a place where both humans and AI can work together.

      Our attempt at this mighty goal, is Aide. We're building an editor that bridges the present and the future — equipped to help developers effectively leverage AI in their workflows today, while paving the way for how we imagine programming with AI will look in the future.

  • https://aide.dev/
    • Aide lets you pick an infra provider and model of choice, add your API key and just start coding. All queries made to the model are available to you in a SQLite DB locally, and our prompts are Open Source.

    • https://github.com/codestoryai/prompts
      • Contains the prompts we use to talk to various LLMs for different utilities inside the editor

I couldn't see a PR submission for aide's results here though:

  • https://github.com/swe-bench/experiments

And since I hadn't seen that MentatBot on the SWE-Bench leaderboard either, here's the blog link + results submission PR for it:

  • https://mentat.ai/blog/mentatbot-sota-coding-agent
  • https://github.com/swe-bench/experiments/pull/34

0xdevalias avatar Jul 01 '24 03:07 0xdevalias

Also things like symbex for python seem very promising to allow a bird's eye view of a project by seeing only the signature of each functions. Like a human would. I'm sure there exist other general parser for multiple languages.

This blog about how they use tree-sitter to build a graph of the repo/code is also really interesting/useful:

Stack graphs may also help in the 'code search/context' space of things (similar to aider's repo map/etc); it's what powers GitHub's smart code navigation features:

  • #742

0xdevalias avatar Jul 01 '24 03:07 0xdevalias

Here is a link to a twitter thread explaining it: https://x.com/skcd42/status/1806640696662675469

neubig avatar Jul 01 '24 03:07 neubig

@assertion @0xdevalias Yeah I wrote it wrong and forgot to edit it.

PierrunoYT avatar Jul 01 '24 05:07 PierrunoYT

I think that actually we can probably close this issue in favor of the more concrete #2185, #2220, #2221

neubig avatar Jul 03 '24 05:07 neubig