trufflehog icon indicating copy to clipboard operation
trufflehog copied to clipboard

Avoid noise from patterns with SHA: prefix in the Old GitHub detector

Open dinvlad opened this issue 3 years ago • 2 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

Improve regex patterns for the Old GitHub detector to avoid tokens with SHA: prefix.

Problem to be Addressed

The "Old GitHub" token detector shows up with many false positives in our environment, where we scan GitHub Actions logs. These logs contain a lot of lines like

2022-08-19T17:43:33.6861115Z Download action repository 'github/super-linter@v4' (SHA:01d3218744765b55c3b5ffbb27e50961e50c33c5)

and these show up as "Old GitHub tokens", e.g. 01d3218744765b55c3b5ffbb27e50961e50c33c5, where the "token" is just the SHA1 of a commit.

This is altogether not surprising (although undesirable) according to the current pattern, which matches these lines because they contain github keyword and then a 40-character hex string (commit sha) in them.

Description of the Preferred Solution

This seems to be a common enough use case and pattern that I think we should add a lookbehind (or something similar) that excludes patterns that start with SHA: prefix right before the 40-character hex.

Alternatively, perhaps we should just add SHA: to the known FPs here?

Additional Context

If you think this should be done, I'd be glad to submit a PR. Alternatively, please suggest a workaround (besides excluding the Old GitHub detector altogether) that could filter out such findings.

References

dinvlad avatar Aug 19 '22 19:08 dinvlad

Hey @dinvlad, thanks for reporting this. We'd definitely welcome and improvement here. Unfortunately I don't think the linked FP exclusion will actually work, because it only applies to the matched content which would exclude the SHA:, and also github commit that is currently there. It'll need a different approach.

dustin-decker avatar Sep 01 '22 00:09 dustin-decker

Interesting, I see. Do you have any suggestions on what could work here?

dinvlad avatar Sep 01 '22 14:09 dinvlad

Hey Dinvlad, please use the --only-verified flag to sort these out. We test all the keys, so that flag tells you which ones are valid and which you don't need to worry about

dxa4481 avatar Nov 23 '22 18:11 dxa4481