aider latest, not working as well with gpt-3.5

latest, not working

https://asciinema.org/a/WhosqpVZAZtk9PFFQbW92fw3v

commit https://github.com/paul-gauthier/aider/commit/cdf8f9a4b2b4a65993227ac5af1eaf3f1b85c9d8 works

https://asciinema.org/a/lRM3pT49q99Od6Ft1ZpMunZQ0

I've had https://github.com/paul-gauthier/aider/commit/cdf8f9a4b2b4a65993227ac5af1eaf3f1b85c9d8 write around 8k lines during the weekend and it consistently works.

Originally posted by @alexanderkjeldaas in https://github.com/paul-gauthier/aider/issues/72#issuecomment-1628183723

Jul 10 '23 13:07 paul-gauthier

Thanks for trying aider and it's great to hear you've been able to get 8K lines of code out of it already! And thanks for providing such a clear and detailed issue describing the problem you're having.

GPT-3.5 is quite unreliable, and often disobeys the system prompt instructions. When that happens, you get results like what you've shown in your "not working" screencast. Even worse, if you "tweak" the system prompt to try and make things work better for one codebase, you may end up making it worse for (many) other codebases.

I spend a lot of effort trying to make GPT-3.5 produce good output and make sure I don't introduce regressions with new releases of aider. As part of that, I've developed a fairly extensive benchmarking suite. It uses aider to try and solve 133 coding exercises, and keeps track of how many of them pass all unit tests. This lets me objectively compare whether new version of aider are better or worse at coding than old versions.

I checked cdf8f9a and the latest build, and both of them perform pretty much identically on the benchmark. I have also reviewed all the relevant code changes between cdf8f9a and latest, and nothing jumps out at me as a likely cause for a regression.

This probably means that you're just having bad luck with GPT-3.5. It will also tend to do worse as the size of the codebase increases. Some suggestions that might help you:

When you /add files, be selective. Only add the files that GPT needs to see to accomplish the task you're going to give it. This might help GPT-3.5 focus and follow instructions. It also saves you money in token costs.
If you have access to GPT-4, use it instead. It is much more reliable.

I will keep looking into this, but I wanted to get back you with my research so far.

Jul 11 '23 20:07 paul-gauthier

I am pretty confident that this is not a fluke.

For https://github.com/paul-gauthier/aider/commit/cdf8f9a4b2b4a65993227ac5af1eaf3f1b85c9d8 I now have 229 commits out of 294 commits with aider, and a total of 17k lines from git log -p | wc -l. Thus since I created this issue I have had aider produce twice as much code without a single instance of the parsing issue.

OTOH, I retried main now, and it fails on the first command as in the terminal animation.

A quick git bisect points to de02038a7e25f4edeb3d2992a53be6debd193664 being the commit that introduces this issue. I have not checked this too deeply, I will double check after some meetings.

Edit: can confirm that 7c56dcb (parent to de02038) is the last working commit.

Jul 13 '23 10:07 alexanderkjeldaas

Again, thank you for your diligent efforts to track down this issue!

The https://github.com/paul-gauthier/aider/commit/de02038a7e25f4edeb3d2992a53be6debd193664 commit makes a innocuous change to the /token command that prints a report of context window token usage. It is unrelated to the way aider communicates with GPT related to code changes.

But that commit is on a branch for PR #58 which rationalized the support for unicode encodings. So it's possible your bisect is just telling us that something in that PR is causing a problem. Your screencasts above seem to be from a Mac, which handles files as utf-8 by default. The PR was mainly to help Windows users, because Windows doesn't default to utf-8.

Any chance some of your files have non-utf8 encodings?

Another thing you could try is run aider --verbose. This will show you everything (!) that aider is sending to GPT. You could run it that way on https://github.com/paul-gauthier/aider/commit/cdf8f9a4b2b4a65993227ac5af1eaf3f1b85c9d8 and on the latest and inspect the output to look for differences.

You should be able to do it all as a "one liner" like this, to redirect all the debug data to files so you can properly diff them.

aider gitlab_epic_analyzer/* README.md pyproject.toml .env \
    --verbose \
    --msg "add a function get_projects in GitlabAPI that finds all projects in a group and its subgroups" \
    > verbose-output.txt

Jul 13 '23 14:07 paul-gauthier

I think most files should be parseable as utf8, but this is the encodings:

git ls-files | xargs file
.env.example:                                        ASCII text
.gitignore:                                          ASCII text
.gitlab-ci.yml:                                      ASCII text
.pylintrc:                                           ASCII text
Dockerfile:                                          ASCII text
README.md:                                           ASCII text, with very long lines (859)
gitlab_epic_analyzer/__init__.py:                    empty
gitlab_epic_analyzer/cache.py:                       Python script text executable, ASCII text
gitlab_epic_analyzer/constraints_solver.py:          Python script text executable, Unicode text, UTF-8 text
gitlab_epic_analyzer/epic.py:                        Python script text executable, ASCII text
gitlab_epic_analyzer/epic_set.py:                    Python script text executable, Unicode text, UTF-8 text
gitlab_epic_analyzer/gitlab_api.py:                  Python script text executable, ASCII text
gitlab_epic_analyzer/gitlab_epic_analyzer.py:        Python script text executable, ASCII text
gitlab_epic_analyzer/issue.py:                       Python script text executable, ASCII text
gitlab_epic_analyzer/issue_ref.py:                   Python script text executable, ASCII text
gitlab_epic_analyzer/links.py:                       Python script text executable, ASCII text
gitlab_epic_analyzer/main.py:                        Python script text executable, ASCII text
gitlab_epic_analyzer/milestone.py:                   Python script text executable, ASCII text
gitlab_epic_analyzer/milestone_set.py:               Python script text executable, ASCII text
gitlab_epic_analyzer/mini_roadmap.py:                Python script text executable, ASCII text
gitlab_epic_analyzer/mini_roadmap_set.py:            Python script text executable, ASCII text
gitlab_epic_analyzer/project.py:                     Python script text executable, ASCII text
gitlab_epic_analyzer/project_set.py:                 Python script text executable, ASCII text
gitlab_epic_analyzer/slack.py:                       Python script text executable, ASCII text
gitlab_epic_analyzer/tests/__init__.py:              ASCII text
gitlab_epic_analyzer/tests/test_epic.py:             Python script text executable, ASCII text
gitlab_epic_analyzer/tests/test_epic_set.py:         empty
gitlab_epic_analyzer/tests/test_gitlab_api.py:       Python script text executable, ASCII text
gitlab_epic_analyzer/tests/test_issue.py:            Python script text executable, ASCII text
gitlab_epic_analyzer/tests/test_milestone.py:        Python script text executable, ASCII text
gitlab_epic_analyzer/tests/test_milestone_set.py:    Python script text executable, ASCII text
gitlab_epic_analyzer/tests/test_mini_roadmap.py:     Python script text executable, ASCII text
gitlab_epic_analyzer/tests/test_mini_roadmap_set.py: Python script text executable, ASCII text
gitlab_epic_analyzer/tests/test_project.py:          Python script text executable, ASCII text
gitlab_epic_analyzer/tests/test_project_set.py:      Python script text executable, ASCII text
poetry.lock:                                         ASCII text, with very long lines (809)
pyproject.toml:                                      ASCII text
slack/manifest.yml:                                  ASCII text
tests/__init__.py:                                   empty
tox.ini:                                             ASCII text

Jul 13 '23 15:07 alexanderkjeldaas

Actually, I see that it is not actually broken for HEAD now anymore, it is simply that it shows a humongously large diff that covers the whole file, and then gradually makes it smaller.

Previously it would only show parts of the non-processed parts of the file as it is being processed.

See here

Jul 13 '23 15:07 alexanderkjeldaas

So all those spurious "red delete diffs" are a bug that was fixed in v0.8.3.

I don't think your screen recording is actually running the latest version of aider. Since v0.8.3, aider announces its version when you launch it, like this:

aider$ git checkout v0.8.3
...
HEAD is now at a70ffdd version bump to 0.8.3

aider$ aider
Aider v0.8.3 <=======
Model: gpt-4
Git repo: .git
Repo-map: universal-ctags using 1024 tokens
...

I don't see any version number in your screen recording, so you seem to be running something earlier than v0.8.3?

I guess the main question is: are you actually seeing problems with the coding in the newest builds, or did the spurious "red delete diffs" just trick you into thinking GPT-3.5 was mangling the file?

Jul 13 '23 16:07 paul-gauthier

Yes sorry my last recording was at a confused commit, I was still in my bisect. Also on HEAD, I am getting proper output now, so I guess 🤷‍♂️

https://asciinema.org/a/lRvYB02KwUItpTddMQaOZ7B60

Maybe this issue can be closed. I'll update if I get back to the previous state.

Jul 13 '23 22:07 alexanderkjeldaas

Sure, feel free to re-open or file a new issue any time if you see problems.

Thanks again for all the effort debugging. I take possible code editing regressions pretty seriously, and am always happy to try and chase down potential problems like this.

Jul 14 '23 00:07 paul-gauthier

aider aider copied to clipboard

latest, not working as well with gpt-3.5

aider
aider copied to clipboard