trufflehog icon indicating copy to clipboard operation
trufflehog copied to clipboard

Remove `if len(match) != 2` check in detectors

Open rgmz opened this issue 2 months ago • 8 comments

Description:

I believe the intention behind this code was to make sure that a given pattern matched, but it doesn't actually do this. A while ago, I explained in a PR comment how the if len(match) != 2 check doesn't necessarily do what it appears to, and how it can actually silently break detectors.

In short, FindAllStringSubmatch only returns results that match the given pattern. The number of match groups in a specific match is static, meaning that the pattern (fo.)bar(.?) will always have 3 groups even if the group after bar is empty. If the number of match groups in a pattern is changed and the if statement isn't, it will cause matches to be silently discarded.

https://go.dev/play/p/TXYh1bQItaO

Checklist:

  • [ ] Tests passing (make test-community)?
  • [ ] Lint passing (make lint this requires golangci-lint)?

rgmz avatar Apr 25 '24 12:04 rgmz

Even I was curious as to know when and how many times does this if len(match) != 2 is equating to true. So I added a print statement for every instance of it for all the detectors in the first week of Jan this year.

I have only seen this message for jira token detector few times for domainPat and emailPat. But I haven't done the root cause analysis for the same.

image

bugbaba avatar Apr 26 '24 02:04 bugbaba

The JIRA email is slightly different because it's checking the length of the split and not the match groups. I don't really understand it's purpose, though.

https://github.com/trufflesecurity/trufflehog/blob/d1a29f74a900d5586a7d0af828107a37eab66ad0/pkg/detectors/jiratoken/v1/jiratoken.go#L56-L59

rgmz avatar Apr 26 '24 13:04 rgmz

@rgmz I believe this check is to prevent index errors like the one demonstrated here https://go.dev/play/p/MS8AyzGb1uq. Rather than removing the check, it would be better if we change the condition to if len(match) < 2 to ensure at least one capture group is present in the regex.

zricethezav avatar May 01 '24 15:05 zricethezav

That still has the problem of silently ignoring legitimate errors. I believe that logic errors should explicitly fail, rather than quietly continuing.

rgmz avatar May 01 '24 15:05 rgmz

That still has the problem of silently ignoring legitimate errors. I believe that logic errors should explicitly fail, rather than quietly continuing.

I disagree. It's less of a logic error and more of a incorrectly written regex issue. A single index error on a detector shouldn't cause the whole scan to fail imo.

zricethezav avatar May 01 '24 16:05 zricethezav

Some patterns are precise and don't have any capture groups, so you'd only have one match (the entire matching string).

A single index error on a detector shouldn't cause the whole scan to fail imo.

This error means that the detector can never succeed and will miss valid secrets. It would be easy to detect in the build/test process, so a user should never actually ecounter this issue.

rgmz avatar May 01 '24 16:05 rgmz

Some patterns are precise and don't have any capture groups, so you'd only have one match (the entire matching string).

Then the check could be safely removed if present.

For me to feel comfortable merging this blanket removal would require a test that ensures we would not get an index out of range error.

It would be easy to detect in the build/test process, so a user should never actually ecounter this issue.

I agree with this

zricethezav avatar May 01 '24 16:05 zricethezav

For me to feel comfortable merging this blanket removal would require a test that ensures we would not get an index out of range error.

Ideally, each detector would have unit tests but only a few do at the moment.

The TestX_FromChunk case should cover this, however, it isn't clear what triggers those cases. They don't seem to run with test: https://github.com/trufflesecurity/trufflehog/actions/runs/8912166932/job/24475012246

rgmz avatar May 01 '24 16:05 rgmz