trufflehog
trufflehog copied to clipboard
Remove `if len(match) != 2` check in detectors
Description:
I believe the intention behind this code was to make sure that a given pattern matched, but it doesn't actually do this. A while ago, I explained in a PR comment how the if len(match) != 2
check doesn't necessarily do what it appears to, and how it can actually silently break detectors.
In short, FindAllStringSubmatch
only returns results that match the given pattern. The number of match groups in a specific match is static, meaning that the pattern (fo.)bar(.?)
will always have 3 groups even if the group after bar
is empty. If the number of match groups in a pattern is changed and the if statement isn't, it will cause matches to be silently discarded.
https://go.dev/play/p/TXYh1bQItaO
Checklist:
- [ ] Tests passing (
make test-community
)? - [ ] Lint passing (
make lint
this requires golangci-lint)?
Even I was curious as to know when and how many times does this if len(match) != 2
is equating to true. So I added a print statement for every instance of it for all the detectors in the first week of Jan this year.
I have only seen this message for jira
token detector few times for domainPat
and emailPat
. But I haven't done the root cause analysis for the same.
The JIRA email is slightly different because it's checking the length of the split and not the match groups. I don't really understand it's purpose, though.
https://github.com/trufflesecurity/trufflehog/blob/d1a29f74a900d5586a7d0af828107a37eab66ad0/pkg/detectors/jiratoken/v1/jiratoken.go#L56-L59
@rgmz I believe this check is to prevent index errors like the one demonstrated here https://go.dev/play/p/MS8AyzGb1uq. Rather than removing the check, it would be better if we change the condition to if len(match) < 2
to ensure at least one capture group is present in the regex.
That still has the problem of silently ignoring legitimate errors. I believe that logic errors should explicitly fail, rather than quietly continuing.
That still has the problem of silently ignoring legitimate errors. I believe that logic errors should explicitly fail, rather than quietly continuing.
I disagree. It's less of a logic error and more of a incorrectly written regex issue. A single index error on a detector shouldn't cause the whole scan to fail imo.
Some patterns are precise and don't have any capture groups, so you'd only have one match (the entire matching string).
A single index error on a detector shouldn't cause the whole scan to fail imo.
This error means that the detector can never succeed and will miss valid secrets. It would be easy to detect in the build/test process, so a user should never actually ecounter this issue.
Some patterns are precise and don't have any capture groups, so you'd only have one match (the entire matching string).
Then the check could be safely removed if present.
For me to feel comfortable merging this blanket removal would require a test that ensures we would not get an index out of range error.
It would be easy to detect in the build/test process, so a user should never actually ecounter this issue.
I agree with this
For me to feel comfortable merging this blanket removal would require a test that ensures we would not get an index out of range error.
Ideally, each detector would have unit tests but only a few do at the moment.
The TestX_FromChunk
case should cover this, however, it isn't clear what triggers those cases. They don't seem to run with test
:
https://github.com/trufflesecurity/trufflehog/actions/runs/8912166932/job/24475012246