gptel
gptel copied to clipboard
gptel-context: Allow exclusion of gitignored files
Continuing the discussion here, this PR allows the user to exclude, by setting the user option gptel-context-exclude-gitignored to t, gitignored files or directories from the context.
Tagging participants in previous discussion: @hartikainen, @collinarnett, @metachip, @sandinmyjoints.
(I just noticed that I forgot to set the default value of gptel-context-exclude-gitignored to nil, but now I wonder if we should leave it as t. I don’t have a clear view on this.)
I would set it to t by default. I tested out the new add directory functionality on a repository with node_modules folder and it froze my emacs adding thousands of files to the context window. Other utilities like RepoMix enable gitignore exclusion by default as well.
Could you move all the git-related functions gptel-context--git-files, gptel-context--skip-file-p, gptel-context--message-git-skipped and gptel-context--build-git-cache into one place in the file? You can add a ;;; git tracking functions header over them for navigation.
Additionally, anywhere you run a file-name function like file-relative-name on a file, please check that the file argument can't be nil. I haven't analyzed all possible cases.
locate-dominating-file can be very slow on remote files, to the point of locking up Emacs for a while.
It might be worth using file-remote-p and skipping the gitignore check if it's a remote file.
@karthink, I believe I have now addressed all your requested changes. Let me know if you have further comments.
Late to the party here (and I wouldn't want to derail this feature; I was recently bitten by it), but was the use of project.el (via project-files) considered now that it's included in emacs? That would get you cross-VCS support for free.
Hi @vermiculus. I didn’t consider using project.el because I am not familiar with that package. Let me take a look and I’ll see if I can adapt it.
@vermiculus: I revised the PR to make it use project.el. The code is simpler now.
It would be great if folks could test this before it is merged: I’m not a programmer and my eyes may fail to catch problems.
I see some whitespace changes that apparently just change spaces to TABs, most likely since emacs has indent-tabs-mode on by default, and especially if you have aggressive-indent-mode on, it can cause unecessary changes in places you did not even meant to touch. (This is why I submitted #712.) In case you remove those changes, consider doing some squashing as well, this PR has currently 34 commits and I think some of them could be squashed together to make the git history cleaner.
I think I'm basically done with the changes here. My only concern is that (member file gptel-context--project-files) may be a bit slow for projects with lots of files. I believe this issue arose after we started using project.el, and happens because (member file gptel-context--project-files) has to linearly search through the entire list of project files to check if a file is present. Claude suggests using a hash table, but I don’t know enough Elisp to implement this suggestion.
If and when @karthink agrees to merge this PR, I'm happy to do some squashing to clean up the commit history.
(If the performance issue is not addressed, I am now inclined to set the default value of gptel-context-restrict-to-project-files to nil.)
Hi @karthink, and great work @benthamite on this PR!
This feature is excellent! For me, it addresses a pressing need, as it was the first thing I looked for when I started adding projects to the context.
@karthink, is there an estimated timeline for merging this? I'd be happy to contribute by implementing the hash-table optimisation that was discussed, if that would help move this PR forward towards main.
@karthink, is there an estimated timeline for merging this? I'd be happy to contribute by implementing the hash-table optimisation that was discussed, if that would help move this PR forward towards main.
I've just been busy with other features (and the new release) is all. I had some performance concerns about this PR, but they might be fixed now. I need to check.
Is adding entire projects to the context even viable? It seems like a lot, not to mention inefficient. Maybe I'm just behind in my thinking, context windows have been increasing recently.
FWIW, I do agree it’s generally inefficient. My workflow for integrating AI with projects has changed since submitting this PR, and now I almost always add selected project files rather than all files in a project. Still, I guess that if one wants to do the latter, one typically wants to exclude gitignored files, so I think this is probably still a useful feature to have (modulo the performance concerns, which I agree are valid).
Still, I guess that if one wants to do the latter, one typically wants to exclude gitignored files
Agreed, excluding the .gitignored files would be a very useful feature in my opinion
Waiting on this feature. Thank you.
@karthink, shall we make a decision regarding this? If you don’t want to merge it, that’s totally fine by me, but I would prefer to close it so that people are not kept waiting.
@karthink, shall we make a decision regarding this? If you don’t want to merge it, that’s totally fine by me, but I would prefer to close it so that people are not kept waiting.
I wanted to merge this today, but the cache check is too fragile. Here is an example of the kind of bug you can produce:
- Run
M-x gptel-context-add-fileand navigate to a file from another project. Add it. - Now add a gitignored file from the current project. It will be added too.
Changed the caching logic, fixed some bugs and merged. Thanks for the PR @benthamite.