trufflehog
trufflehog copied to clipboard
re2 error: `re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35`
Please review the Community Note before submitting
TruffleHog Version
Trace Output
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
Expected Behavior
The chunk data should be scanned.
Actual Behavior
TruffleHog outputs the aforementioned error from re2, making it unclear what the cause is and whether certain chunks were skipped.
Steps to Reproduce
The error seems semi-random so it's difficult to reproduce. Additionally, the log comes directly from re2.cc
, meaning there is no context associated with it.
Environment
- OS: [e.g. iOS]
- Version [e.g. 22]
Additional Context
https://github.com/google/re2/issues/186
References
- #0000
Maybe providing an option for users to pick which regex engine they want, re2 or default, would be worthwhile since re2 is a drop-in replacement of regex
It couldn't hurt given #2354. I think this specific error is caused by the configured max_mem
for re2 being smaller than the TruffleHog's maximum diff size.
https://github.com/trufflesecurity/trufflehog/blob/2888f8cdfcb1b70f1814dc223d17d45fc4eebb20/pkg/gitparse/gitparse.go#L27-L28
The whole diff is never scanned, we use a sliding-window-with-overlap chunker to break up data into more manageable chunks:
https://github.com/trufflesecurity/trufflehog/blob/333c4f52961bf1d06d04a82fbdea35a796d102db/pkg/sources/chunker.go#L13-L18
Looks like the default max_mem is 8MB, so i'm guessing we have an expensive regex on some data?
Unfortunately, this seems to be a transient error. I've attempted to re-scan orgs/repos where I encountered it but haven't been able to reproduce it (so far).
It might be possible for wasilibs/go-re2
to catch failures from the underlying RE2::Match
method and log additional context.
https://github.com/google/re2/blob/b7e96b34c0945fccb8b5282404f82c7ab0843717/re2/re2.cc#L772-L777
The whole diff is never scanned, we use a sliding-window-with-overlap chunker to break up data into more manageable chunks:
https://github.com/trufflesecurity/trufflehog/blob/333c4f52961bf1d06d04a82fbdea35a796d102db/pkg/sources/chunker.go#L13-L18
Looks like the default max_mem is 8MB, so i'm guessing we have an expensive regex on some data?
This issue should be resolved with this pull request since the data fed to the regex will not exceed 4kB.