gitingest
gitingest copied to clipboard
Add Flag to Automatically Exclude .gitignore
This pull request introduces a new CLI flag (--use-gitignore) that enhances Gitingest by automatically loading and applying ignore patterns from all .gitignore files found in the target repository or directory. When enabled, files and directories matching any pattern specified in any .gitignore are excluded from the generated text digest.
Key Changes:
-
CLI Update:
- Modified
src/gitingest/cli.pyto add a new option--use-gitignorethat accepts a boolean value. - Updated the
main()function to pass the new flag to the asynchronous ingestion entry point.
- Modified
-
Ingestion Entry Point:
- Updated
src/gitingest/entrypoint.pyto include a new parameteruse_gitignore. - Integrated a call to the new helper function
load_gitignore_patterns()(fromsrc/gitingest/utils/ignore_patterns.py) to update the query’s ignore patterns with all patterns extracted from.gitignorefiles.
- Updated
-
Gitignore Loader:
- Implemented
load_gitignore_patterns()insrc/gitingest/utils/ignore_patterns.py, which recursively searches for.gitignorefiles starting from the repository root and aggregates their ignore patterns. - Added comprehensive docstrings to the loader function to adhere to coding style guidelines.
- Implemented
-
Testing:
- Created new tests in
tests/test_gitignore_feature.pyto verify that:- With
--use-gitignoreenabled, files matching.gitignorepatterns are excluded from the digest. - Without the flag, all files are included.
- With
- Fixed linting and formatting issues in tests and source files.
- Created new tests in
This feature provides a seamless way to respect repository-level ignore rules, ensuring that the generated digest is more relevant for ingestion by large language models. It improves usability by reducing the need for manual pattern exclusions and aligns the tool’s behavior more closely with Git’s own ignore logic.
@ArmanJR Thanks for your contribution, I've looked at the code and it looks OK, I would have to run some tests myself
In order to merge this I think we would need to reflect those changes in the front-end so gitingest keeps 1-1 features no matter from where it is accessed
Here's a rough sketch of how it could look like:
Do you think you can handle this or do you want us to help you with that?
I'll try :)
Very useful feature!
I’d suggest enabling it by default and renaming the flag to something like --no-gitignore to ensure a safer default behavior.
@cyclotruc I believe having the checkbox on UI is redundant, as the files on a GitHub repo are already ignored if mentioned in the .gitignore. The intuition behind my PR is having the ability to ignore files when running gitingest on a local repo via CLI.
@cyclotruc bump!
Hi, sorry for the delay i've been busy but will come back to this soon, thanks again for your patience
This will be a great feature. For me I expected this behaviour by default and was surprised to find some credentials in the digest.txt after running on a local project.
Also, there is another issue in local gitingest ./ calling. When you call it for the second time, it includes the previous digest.txt in the new digest file:
~/code/Go/sandbox ··········································································································· 10:39:58 AM
❯ echo "boz hi" > main.go
~/code/Go/sandbox ··········································································································· 10:40:18 AM
❯ cat main.go
boz hi
~/code/Go/sandbox ··········································································································· 10:40:21 AM
❯ gitingest ./
Analysis complete! Output written to: digest.txt
Summary:
Repository: ./
Files analyzed: 1
Estimated tokens: 29
❯ cat digest.txt
Directory structure:
└── .//
└── main.go
================================================
File: /main.go
================================================
boz hi
~/code/Go/sandbox ··········································································································· 10:40:32 AM
❯ gitingest ./
Analysis complete! Output written to: digest.txt
Summary:
Repository: ./
Files analyzed: 2
Estimated tokens: 73
~/code/Go/sandbox ··········································································································· 10:40:37 AM
❯ cat digest.txt
Directory structure:
└── .//
├── digest.txt
└── main.go
================================================
File: /digest.txt
================================================
Directory structure:
└── .//
└── main.go
================================================
File: /main.go
================================================
boz hi
================================================
File: /main.go
================================================
boz hi
I believe since users usually don't double-check the content of digest.txt, it's better to ignore digest.txt by default.
@ArmanJR The tests are failing. Can you have a look at it?
@ArmanJR Thank you for the contribution We're interested in this feature, do you think you want to continue working on it or should we take it from there? Happy to help if you want to finish
Of course, I'd be happy to help. Is there anything else that should be implemented?
Suggestion on flag semantics & naming
-
Respect
.gitignoreby default. Most developer-facing CLIs (ripgrep, fd, etc.) do this because it’s safer (no secrets or build artefacts leak). -
Invert the flag so users opt in when they genuinely want the extra noise. Two workable spellings: •
--no-gitignore(mirrorsripgrep --no-ignore) •--include-gitignored(reads like “please pull in the files that are normally ignored”). -
Whatever name we choose, the default should be
Trueso existing scripts keep working and only the rare cases need the override:# normal – git-ignored files skipped gitingest # exceptional – include everything gitingest --no-gitignore -
Implementation note: using the pathspec library would give us full Git-wildmatch coverage (negations,
**, order-aware precedence) practically for free. We might also want to reimplement the_should_includeand_should_excludefunctions that currently usefnmatchwithpathspec.
Thanks a lot @ArmanJR!
Just a follow up question: What was the reason for the version (>=0.12.1) dependency of pathspec>=0.12.1?
@filipchristiansen You mean why 0.12.1? I think 0.12.0 had a bug