trufflehog Secrets are reported on the wrong line

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

TruffleHog Version

3.59.0 (and older versions)

Trace Output

https://gist.github.com/det/080c98039750a5296c6856efaaed8b5c

Expected Behavior

Secret should be reported on line 557

Actual Behavior

Secret is reported on line 287 (and a different wrong line number on older versions of trufflehog)

Steps to Reproduce

wget https://gist.githubusercontent.com/det/1526b4c16d0e07ac023d75c912a68658/raw/c3061c14a811205a65cbdcf0065bd3c11d88bfcb/test.txt
trufflehog filesystem test.txt
The wrong line number is reported

Environment

OS: Linux

References

May be related to #1537

Oct 08 '23 18:10 det

I just tested, and this problem persists even with #1891 merged.

Oct 18 '23 21:10 det

#1891 appears to be fixing an off-by-one error - not whatever's causing this.

Oct 18 '23 22:10 sxlijin

This appears to be coming from the Chunker logic. A quick change of ChunkSize to 10 * 10 * 1024 returns the correct line number.

Found unverified result 🐷🔑❓
Detector Type: Github
Decoder Type: PLAIN
Raw result: ghs_012345678901234567890123456789012345
Rotation_guide: https://howtorotate.com/docs/tutorials/github/
File: a.txt
Line: 557

Oct 19 '23 00:10 shreyas-sriram

If bumping ChunkSize from 10KiB to 100KiB fixes the issue, then that implies to me that:

trufflehog is not reporting the line number within the file, it is reporting the line number within a given chunk
bumping from 10KiB to 100KiB would only solve the problem for smaller files, and line numbers for secrets after the first 100KiB of a file will still be wrong

Also, presumably the Chunker was implemented for performance reasons (I'm guessing because there are so many detectors that are each running their own regex matching per chunk?) - what implications does bumping from 10KiB to 100KiB have for that?

Oct 19 '23 18:10 sxlijin

Here's another repro:

     1  // this block is xxxxxxxxxxxxxxxxx 1024KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     2  // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     3  // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     4  // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     5  // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     6  // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     7  // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     8  // this line ix xxxxxxxxxxxxxxxxxxxx 128KiB total xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
     9  const token = "ghs_111111111111111111111111111111111111";

$ trufflehog-3.48.0 filesystem --json --fail --no-verification --no-update --exclude-detectors=PagerDutyApiKey,LaunchDarkly foo.ts
{"SourceMetadata":{"Data":{"Filesystem":{"file":"foo.ts","line":4}}},"SourceID":1,"SourceType":15,"SourceName":"trufflehog - filesystem","DetectorType":8,"DetectorName":"Github","DecoderName":"PLAIN","Verified":false,"Raw":"ghs_111111111111111111111111111111111111","RawV2":"","Redacted":"","ExtraData":null,"StructuredData":null}
{"SourceMetadata":{"Data":{"Filesystem":{"file":"foo.ts","line":8}}},"SourceID":1,"SourceType":15,"SourceName":"trufflehog - filesystem","DetectorType":8,"DetectorName":"Github","DecoderName":"PLAIN","Verified":false,"Raw":"ghs_111111111111111111111111111111111111","RawV2":"","Redacted":"","ExtraData":null,"StructuredData":null}

Oct 19 '23 21:10 sxlijin

The reason for this is that filesystem doesn't do any special chunking. Git-based sources maintain line numbers through chunking in the git source which does its own line-aware chunking. We should add that logic to the general chunker. Or maybe to the util package so any source can utilize it.

Oct 24 '23 15:10 bill-rich

The reason for this is that filesystem doesn't do any special chunking

What is "this" in the context of your reply?

The question I'm looking to answer is "why does trufflehog filesystem reproducibly report the wrong file numbers in the described situations?" and so far the only answer suggested (that points the finger at chunking) doesn't make sense.

Nov 01 '23 21:11 sxlijin

Has the same issue via filesystem mode. Line number calculation is wrong. Version 3.75.1

May 07 '24 17:05 Yullia

trufflehog trufflehog copied to clipboard

Secrets are reported on the wrong line

Community Note

TruffleHog Version

Trace Output

Expected Behavior

Actual Behavior

Steps to Reproduce

Environment

References

trufflehog
trufflehog copied to clipboard