noseyparker
noseyparker copied to clipboard
Make `scan --ignore FILENAME` apply to blobs in Git repositories
The scan
command currently has a --ignore FILENAME
option, which allows one to specify a gitignore-style rules files for paths to ignore when scanning. Those ignore rules are only applied to plain files that are scanned, and not blobs found within Git repositories. Those rules should also apply to Git blobs.
This is probably dependent on #16 being completed first.
This feature could be useful when dealing with scanning monorepos on a per-project basis: https://github.com/praetorian-inc/noseyparker/discussions/119
To implement this today, the most expedient approach:
- Modify
GitRepoWithMetadataEnumerator
to add aGitIgnore
field, initialized from the same ignore rules file used in the filesystem enumerator - Modify the enumeration code to test each blob path against the
GitIgnore
matcher, only keeping those blobs that are not ignored
Some complications:
- It seems like the
GitIgnore
struct would have to be duplicated between the filesystem enumerator and git enumerator, since theignore
crate doesn't expose the one that it uses - There are some corner cases in the semantics. If a path cannot be determined for a blob for whatever reason, should there be a warning?
- The best that Nosey Parker could do is filter against the pathname for a blob from the commit where it was first introduced. But a blob may have multiple different paths in its entire history; only the first pathname would be used when making the "should ignore?" decision for the blob.
There is also a general oddity or surprising behavior about Nosey Parker's ignore rules. The ignore rules are .gitignore-style rules. The semantics of those rules are that they are relative to the directory that contains the .gitignore file. However, Nosey Parker uses this format to specify global rules: they are not intended to be directory-specific. The end result of this is that, essentially, all Nosey Parker ignore rules have to start with **/
.
Perhaps the entire path-based ignore mechanism needs some rethinking in Nosey Parker.