trufflehog
trufflehog copied to clipboard
Secrets are reported on the wrong line
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
TruffleHog Version
3.59.0 (and older versions)
Trace Output
https://gist.github.com/det/080c98039750a5296c6856efaaed8b5c
Expected Behavior
Secret should be reported on line 557
Actual Behavior
Secret is reported on line 287 (and a different wrong line number on older versions of trufflehog)
Steps to Reproduce
-
wget https://gist.githubusercontent.com/det/1526b4c16d0e07ac023d75c912a68658/raw/c3061c14a811205a65cbdcf0065bd3c11d88bfcb/test.txt
-
trufflehog filesystem test.txt
- The wrong line number is reported
Environment
- OS: Linux
References
May be related to #1537
I just tested, and this problem persists even with #1891 merged.
#1891 appears to be fixing an off-by-one error - not whatever's causing this.
This appears to be coming from the Chunker logic. A quick change of ChunkSize
to 10 * 10 * 1024
returns the correct line number.
Found unverified result 🐷🔑❓
Detector Type: Github
Decoder Type: PLAIN
Raw result: ghs_012345678901234567890123456789012345
Rotation_guide: https://howtorotate.com/docs/tutorials/github/
File: a.txt
Line: 557
If bumping ChunkSize
from 10KiB to 100KiB fixes the issue, then that implies to me that:
- trufflehog is not reporting the line number within the file, it is reporting the line number within a given chunk
- bumping from 10KiB to 100KiB would only solve the problem for smaller files, and line numbers for secrets after the first 100KiB of a file will still be wrong
Also, presumably the Chunker was implemented for performance reasons (I'm guessing because there are so many detectors that are each running their own regex matching per chunk?) - what implications does bumping from 10KiB to 100KiB have for that?
Here's another repro:
1 // this block is xxxxxxxxxxxxxxxxx 1024KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
2 // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
3 // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
4 // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
5 // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
6 // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
7 // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
8 // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
9 const token = "ghs_111111111111111111111111111111111111";
$ trufflehog-3.48.0 filesystem --json --fail --no-verification --no-update --exclude-detectors=PagerDutyApiKey,LaunchDarkly foo.ts
{"SourceMetadata":{"Data":{"Filesystem":{"file":"foo.ts","line":4}}},"SourceID":1,"SourceType":15,"SourceName":"trufflehog - filesystem","DetectorType":8,"DetectorName":"Github","DecoderName":"PLAIN","Verified":false,"Raw":"ghs_111111111111111111111111111111111111","RawV2":"","Redacted":"","ExtraData":null,"StructuredData":null}
{"SourceMetadata":{"Data":{"Filesystem":{"file":"foo.ts","line":8}}},"SourceID":1,"SourceType":15,"SourceName":"trufflehog - filesystem","DetectorType":8,"DetectorName":"Github","DecoderName":"PLAIN","Verified":false,"Raw":"ghs_111111111111111111111111111111111111","RawV2":"","Redacted":"","ExtraData":null,"StructuredData":null}
The reason for this is that filesystem
doesn't do any special chunking. Git-based sources maintain line numbers through chunking in the git
source which does its own line-aware chunking. We should add that logic to the general chunker. Or maybe to the util
package so any source can utilize it.
The reason for this is that filesystem doesn't do any special chunking
What is "this" in the context of your reply?
The question I'm looking to answer is "why does trufflehog filesystem
reproducibly report the wrong file numbers in the described situations?" and so far the only answer suggested (that points the finger at chunking) doesn't make sense.
Has the same issue via filesystem mode. Line number calculation is wrong. Version 3.75.1