osv-scanner icon indicating copy to clipboard operation
osv-scanner copied to clipboard

Don't traverse gitignored dirs for gitignore files

Open robotdana opened this issue 2 years ago • 1 comments

this seems to be an upstream issue in go-git, and i'll prepare a PR for them soon, but for now this copies in the affected function and fixes it by checking the accumulated patterns while walking the fs looking for gitignore files

fixes: #389

robotdana avatar May 26 '23 05:05 robotdana

Thank you! The gitignore while traversing fix looks good, though we still should ignore sub-directories when the -r flag is not being used in osv-scanner. (Probably requires making readIgnoreFile a public function)

another-rex avatar May 26 '23 05:05 another-rex

Thank you! The gitignore while traversing fix looks good, though we still should ignore sub-directories when the -r flag is not being used in osv-scanner. (Probably requires making readIgnoreFile a public function)

Hi @another-rex. Do you mean

  1. we need to tweak the recursion system that's part of the of filepath.WalkDir() call

or

  1. you want to have functions in the gitignore package that only read a single .gitignore file at the root of the repo. I'm assuming we'd pass the recursive bool into osvscanner.parseGitIgnores() to handle this (ref, ref). That would then find any enclosing git-repo, but not only read the .gitignore at its root once it had done that. Did you also want to exclude .git/info/exclude in that case?

robramsaynz avatar Jan 31 '24 23:01 robramsaynz

Thank you! The gitignore while traversing fix looks good, though we still should ignore sub-directories when the -r flag is not being used in osv-scanner. (Probably requires making readIgnoreFile a public function)

Hi @another-rex. Do you mean

  1. we need to tweak the recursion system that's part of the of filepath.WalkDir() call

or

  1. you want to have functions in the gitignore package that only read a single .gitignore file at the root of the repo. I'm assuming we'd pass the recursive bool into osvscanner.parseGitIgnores() to handle this (ref, ref). That would then find any enclosing git-repo, but not only read the .gitignore at its root once it had done that. Did you also want to exclude .git/info/exclude in that case?

I think I meant the second option. If the recursive option is not passed in, we shouldn't look in sub-directories for gitignore files (since it'll never apply to what is being scanned).

So the only .gitignores that should apply is every .gitignore in the current and each parent/ancestor directory until we reach the root of the repository. (Or if we are not in a repository, just what is in the current directory). I am trying to match the search behavior to other tools like https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md#automatic-filtering.

Did you also want to exclude .git/info/exclude in that case?

Huh TIL about .git/info/exclude. I think we should respect the .git/info/exclude as well, since we are trying to match git behavior.

another-rex avatar Feb 01 '24 00:02 another-rex

Thanks for the clarification @another-rex . So you preference is for the command

osv-scanner ~/projects/_git_repo/dir_a/dir_b

to pick up:

  • ~/projects/git_repo/.gitignore
  • ~/projects/git_repo/dir_a/.gitignore
  • ~/projects/git_repo/dir_a/dir_b/.gitignore

but not

  • ~/projects/git_repo/dir_a/dir_b/subdir/.gitignore
  • ~/projects/git_repo/not_in_original_path/.gitignore

Is that about right?

robramsaynz avatar Feb 01 '24 01:02 robramsaynz

Huh TIL about .git/info/exclude. I think we should respect the .git/info/exclude as well, since we are trying to match git behavior

There's also a /etc/gitconfig file, and user-profile setting of the core.excludesfile property in ~/.gitconfig. The upstream lib has parsing functions for these:

  • https://github.com/go-git/go-git/blob/v5.7.0/plumbing/format/gitignore/dir.go#L115-L129
  • https://github.com/go-git/go-git/blob/v5.7.0/plumbing/format/gitignore/dir.go#L131-L140

which we haven't imported into this code.

robramsaynz avatar Feb 01 '24 01:02 robramsaynz

Thanks for the clarification @another-rex . So you preference is for the command

osv-scanner ~/projects/_git_repo/dir_a/dir_b

to pick up:

  • ~/projects/git_repo/.gitignore
  • ~/projects/git_repo/dir_a/.gitignore
  • ~/projects/git_repo/dir_a/dir_b/.gitignore

but not

  • ~/projects/git_repo/dir_a/dir_b/subdir/.gitignore
  • ~/projects/git_repo/not_in_original_path/.gitignore

Is that about right?

Yep, specifically to not even traverse the subdirs to find those .gitignore files.

another-rex avatar Feb 01 '24 05:02 another-rex

Huh TIL about .git/info/exclude. I think we should respect the .git/info/exclude as well, since we are trying to match git behavior

There's also a /etc/gitconfig file, and user-profile setting of the core.excludesfile property in ~/.gitconfig. The upstream lib has parsing functions for these:

  • https://github.com/go-git/go-git/blob/v5.7.0/plumbing/format/gitignore/dir.go#L115-L129
  • https://github.com/go-git/go-git/blob/v5.7.0/plumbing/format/gitignore/dir.go#L131-L140

which we haven't imported into this code.

Huh... Let's not worry about those ignores for now until we can update upstream then. I originally thought we got that for free with the go git library, but since we are copying it in, happy to just focus on .gitignore files for now.

another-rex avatar Feb 01 '24 05:02 another-rex