aider Failed to apply edit

Failed to apply edit

Open j4yh3m opened this issue 1 year ago • 6 comments

Amazing project! I love where this is going, this is the future of development!

I'm trying the 'pong' example, and it seems to get confused a lot, but making slow progress. At some point when I ask to fix the scores not getting incremented, it systematically fails to apply the edits to the file.

main.py                                                                                                                                                                                                   
 <<<<<<< ORIGINAL                                                                                                                                                                                          
 # Main game loop                                                                                                                                                                                          
 running = True                                                                                                                                                                                            
 while running:                                                                                                                                                                                            
     # Set up the scores                                                                                                                                                                                   
     left_score = 0                                                                                                                                                                                        
     right_score = 0                                                                                                                                                                                       
 =======                                                                                                                                                                                                   
 # Set up the scores                                                                                                                                                                                       
 left_score = 0                                                                                                                                                                                            
 right_score = 0                                                                                                                                                                                           
                                                                                                                                                                                                           
 # Main game loop                                                                                                                                                                                          
 running = True                                                                                                                                                                                            
 while running:                                                                                                                                                                                            
 >>>>>>> UPDATED

I tried cleaning the chat, starting from scratch, but it always seems to get stuck applying that simple change. It has no issue doing other more complex changes in the same file, it only fails for that particular change.

I will look into it and submit a PR if I find the issue.

Jul 03 '23 09:07 j4yh3m

I'm algo getting Failed to apply edit to aider/coders/base_coder.py without much context of why it failed. The proposed diff looks sane. Perhaps diff cannot be applied due to some white space characters?

Jul 03 '23 13:07 emsi

I'm algo getting Failed to apply edit to aider/coders/base_coder.py without much context of why it failed. The proposed diff looks sane. Perhaps diff cannot be applied due to some white space characters?

I thought about whitespace too, even asked aider to produce a test case for it (pretty amazing what it can do!), but in my case it seems that the diff isn't being constructed correctly.

Jul 03 '23 13:07 j4yh3m

So the issue is that aider skips a whole bunch of lines in the ORIGINAL block of the change, and therefore it cannot match it to the contents of the file when trying to apply the change.

main.py                                                                                                                                                                                                   
 <<<<<<< ORIGINAL                                                                                                                                                                                          
 # Main game loop                                                                                                                                                                                          
 running = True                                                                                                                                                                                            
 while running:
     ... <---------------------- lots of lines missing here!
     # Set up the scores                                                                                                                                                                                   
     left_score = 0                                                                                                                                                                                        
     right_score = 0                                                                                                                                                                                       
 =======                                                                                                                                                                                                   
 # Set up the scores                                                                                                                                                                                       
 left_score = 0                                                                                                                                                                                            
 right_score = 0                                                                                                                                                                                           
                                                                                                                                                                                                           
 # Main game loop                                                                                                                                                                                          
 running = True                                                                                                                                                                                            
 while running:                                                                                                                                                                                            
 >>>>>>> UPDATED

Jul 03 '23 17:07 j4yh3m

It's actually quite challenging to convince even GPT-4 to always return properly formatted code edits. As you've noticed, it sometimes just skips a bunch of code with "..." even though the system prompt strongly disallows that. It can make other mistakes sometimes.

If you see that it has returned a bad edit that fails to apply, you can reply and tell it to try again. Saying something like this might help: "your edit block is missing a bunch of code that you replaced with ..., please give me a proper edit that doesn't skip over any code".

Those original/updated sections are called "edit blocks" and you can use that phrase to talk to GPT about mistakes it made with them.

Jul 04 '23 13:07 paul-gauthier

Thanks for the suggestions @paul-gauthier !

I haven't had much time to dig into the aider code yet, but I will see if I can improve the prompts a bit. Have you tried CoT or ToT, or ask it to review its own changes?

I know that the main issue with more complex prompts and agents will be to keep the cost of API calls to an acceptable level, as it's already super expensive like it is to iterate over changes with GPT-4. Maybe some of the verification tasks could be done with GPT-3.5 instead, and limit GPT-4 for the more complex tasks/code generation.

I notice that even GPT-4 sometimes has a lot of trouble performing simple changes, eg: it will use a Python function before declaring it, or forget to declare some variables. Maybe it's related, and it could mean that it would need much more detailed instructions on what to do/avoid (though I'm pretty sure you must have tried many things already)

Jul 05 '23 08:07 j4yh3m

Ya, I have spent a fair amount of effort improving and refining the prompts and edit formats that aider uses with GPT-3.5 and 4.

It can get tricky making ad-hoc prompt changes, because they often help in some situations and hurt in others. I ended up building a benchmark so that I could quantitatively measure how prompting changes affected the overall code editing performance.

I wrote up some notes on the results. If you're thinking about experimenting with prompts it might be useful background.

https://aider.chat/docs/benchmarks.html

Jul 06 '23 05:07 paul-gauthier

If you see that it has returned a bad edit that fails to apply, you can reply and tell it to try again. Saying something like this might help: "your edit block is missing a bunch of code that you replaced with ..., please give me a proper edit that doesn't skip over any code".

Sometimes the reason for the diff failure is not obvious. It would help if we could see the underlying error message.

Jul 06 '23 18:07 emsi

Sometimes the reason for the diff failure is not obvious. It would help if we could see the underlying error message.

The error message would just be that it didn't find a match for the ORIGINAL block in the source file.

Jul 06 '23 19:07 j4yh3m

Ya, I have spent a fair amount of effort improving and refining the prompts and edit formats that aider uses with GPT-3.5 and 4.

Did you check out Microsoft's Guidance? This could be a way... https://github.com/microsoft/guidance

Jul 06 '23 19:07 j4yh3m

Guidance really only fully works with local models. The openai functions api is the closest thing available for remote models, and per the benchmarks I shared above it harms not helps code editing performance unfortunately.

Jul 06 '23 19:07 paul-gauthier

Guidance really only fully works with local models. The openai functions api is the closest thing available for remote models, and per the benchmarks I shared above it harms not helps code editing performance unfortunately.

what do you mean? I'm using Guidance with the GPT-4 and other OpenAI models, didn't have many issues so far except a few bugs.

Jul 06 '23 19:07 j4yh3m

Sorry, maybe the distinction is chat models versus completion models. When using guidance with a chat model, you can't force it to obey a detailed structure like a json schema. Are you doing something like that, perhaps I misread the guidance docs?

https://github.com/microsoft/guidance#chat-see-also-this-notebook

But also see my benchmark results linked above, which aren't supportive of "forcing" the LLM to use a complex json output format.

Jul 06 '23 20:07 paul-gauthier

@paul-gauthier I found the same thing with the function calling API, the structured output is definitely worse. I tried it on another project compared to the previously used text-based prompting that I was using. I discovered this when I was using fine tuning to get a certain type of output but that also wasn't working as well as plain text prompting. If fine tuning is indeed used underneath for function calling, then fine tuning seems to be the underlying cause. My thoughts are that fine tuning constraints the output too much and so causes it to produce lower quality output. Since fine tuning will limit the model to only output certain types of output, it eliminates the large portion of usable output. Maybe OpenAI's fine tuning data was insufficient to capture the large amounts of use cases we would be using it for.

Jul 07 '23 11:07 shootdaj

@paul-gauthier I also encountered this issue and would like to see how we can fix this. I tried your suggestion (https://github.com/paul-gauthier/aider/issues/34#issuecomment-1620273234) but then it asks me to paste the entire contents on the file into the command prompt, which kinda makes this much less usable and error prone when the formatting of the file with newlines and everything needs to be preserved, which is notoriously hard in a command line UI.

I'd like to try out maybe a different prompt to get the diff, could you help me with figuring out what line of code I would need to change to send a different prompt to OpenAI to get the diff?

Jul 07 '23 11:07 shootdaj

Aider is set up to support different system prompts and edit formats in a modular way. If you look in the aider/coders subdirectory, you'll see there's a base coder with base prompts, and then there are five different specific coders.

While it's not yet documented how to add new coders, you may be able to look at the ones that are there and figure out how to do that. You could modify one of the existing coders if that's easy, or you could create a new coder to meet your needs.

Jul 07 '23 14:07 paul-gauthier

I'm going to close this issue for now, but please feel free to re-open if you have further problems.

Jul 08 '23 20:07 paul-gauthier

Absolutely awesome project. I had a similar idea but no clue how to implement it, but you pulled it off! For me, the issue was that whatever I typed into the prompt of aider, it failed, thus being stuck, like the others. But I found a workaround for the problem that likely is also useful for all others, namely pasting the failing file into GPT4 and instructing it to merge with the git diff that was failing. After updating the file, aider created the commit prompt, and after acknowledging with "y" I could proceed as normal.

Jul 10 '23 19:07 Maria-Eltango

aider aider copied to clipboard

Failed to apply edit

aider
aider copied to clipboard