aider Improve SEARCH/REPLACE accuracy (fixed)

LLMs, including GPT-4o, often provide a very short context for the SEARCH block to match.

For example:

<<<<<<< SEARCH
}
=======
// some long code block
>>>>>>> REPLACE

In this case, the current Aider code just uses the first match with just }, which may be a wrong place to edit with a high probability.

This issue could be mitigated by prompt engineering, and we may need to do so. However, in my opinion, we should extract the full effectiveness of classic coding before relying on LLMs.

This PR handles multiple perfect matches since, at the very least, it is safe to reject if there are multiple perfect matches.

A test has been added as well.

(The previous draft PR at https://github.com/paul-gauthier/aider/pull/677 was accidentally closed. I force-pushed as it was my repo's branch :) )

Jun 15 '24 18:06 youknow04

I have used this PR myself and found that it still has trouble with multiple SEARCH/REPLACE blocks. When multiple SEARCH/REPLACE blocks exist, Aider seems to partially apply some of the blocks, which causes trouble in the second round of request-response.

Formally, let C0 be the initial code, and C1 be the partially updated code (by Aider) from C0. In the second turn with this PR, the LLM responds with SEARCH/REPLACE based on C0 again, but the codebase is already C1, which causes the SEARCH/REPLACE blocks failed to match! error.

There should be a partial update prompting here with ValueError, or an all or nothing apply strategy is required.

@paul-gauthier, what do you think?

Jun 17 '24 12:06 youknow04

@paul-gauthier I would like to complete this PR with minimal modification to Aider's previous behavior.

The goals are to:

Partially re-request diffs if multiple perfect matches (SEARCH blocks) exist in the code
Improve speed
Run benchmarks before and after the PR

I want to avoid further conflicts due to the delayed merge, so please confirm if this approach aligns with your vision for improving Aider.

Note that this is an especially painful bug for languages using braces {} like Rust with GPT-4o, or for test codes with many duplications. It may not be easy to recognize since Aider silently makes successful code edits using the first match, which can be incorrect in context.

Aug 15 '24 14:08 youknow04

Aider used to do this, but the benchmark results indicate it's better not to.

Sep 10 '24 21:09 paul-gauthier

@paul-gauthier I've noticed the amount of duplicate entries created by aider are a significant problem unique to using Aider vs. other AI-enhanced code creation/editing tools/methods.

I'm not sure where this should be prevented: wider search blocks, better linting, or even IDE-level-"Problems" interaction... but it seems something needs done to fix this issue.

Commenting here because this seems related, but happy to move the discussion elsewhere as well? Wherever the most appropriate venue you might think is.

Sep 15 '24 21:09 WilliamAGH

This is similar to, but not exactly the same issue as aider increasing its prevalence of inserting blocks like these at a higher rate in recent weeks:

_# ... (rest of the code block)_
_# ... (rest of the Dockerfile)_

Sep 15 '24 21:09 WilliamAGH