Avoid noise from patterns with SHA: prefix in the Old GitHub detector
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Description
Improve regex patterns for the Old GitHub detector to avoid tokens with SHA: prefix.
Problem to be Addressed
The "Old GitHub" token detector shows up with many false positives in our environment, where we scan GitHub Actions logs. These logs contain a lot of lines like
2022-08-19T17:43:33.6861115Z Download action repository 'github/super-linter@v4' (SHA:01d3218744765b55c3b5ffbb27e50961e50c33c5)
and these show up as "Old GitHub tokens", e.g. 01d3218744765b55c3b5ffbb27e50961e50c33c5, where the "token" is just the SHA1 of a commit.
This is altogether not surprising (although undesirable) according to the current pattern, which matches these lines because they contain github keyword and then a 40-character hex string (commit sha) in them.
Description of the Preferred Solution
This seems to be a common enough use case and pattern that I think we should add a lookbehind (or something similar) that excludes patterns that start with SHA: prefix right before the 40-character hex.
Alternatively, perhaps we should just add SHA: to the known FPs here?
Additional Context
If you think this should be done, I'd be glad to submit a PR. Alternatively, please suggest a workaround (besides excluding the Old GitHub detector altogether) that could filter out such findings.
References
Hey @dinvlad, thanks for reporting this. We'd definitely welcome and improvement here. Unfortunately I don't think the linked FP exclusion will actually work, because it only applies to the matched content which would exclude the SHA:, and also github commit that is currently there. It'll need a different approach.
Interesting, I see. Do you have any suggestions on what could work here?
Hey Dinvlad, please use the --only-verified flag to sort these out. We test all the keys, so that flag tells you which ones are valid and which you don't need to worry about