trufflehog icon indicating copy to clipboard operation
trufflehog copied to clipboard

re2 error: `re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35`

Open rgmz opened this issue 10 months ago • 4 comments

Please review the Community Note before submitting

TruffleHog Version

Trace Output

re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35
re2/re2.cc:772: DFA out of memory: pattern length 102, program size 928, list count 352, bytemap range 35

Expected Behavior

The chunk data should be scanned.

Actual Behavior

TruffleHog outputs the aforementioned error from re2, making it unclear what the cause is and whether certain chunks were skipped.

Steps to Reproduce

The error seems semi-random so it's difficult to reproduce. Additionally, the log comes directly from re2.cc, meaning there is no context associated with it.

Environment

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional Context

https://github.com/google/re2/issues/186

References

  • #0000

rgmz avatar Apr 24 '24 11:04 rgmz

Maybe providing an option for users to pick which regex engine they want, re2 or default, would be worthwhile since re2 is a drop-in replacement of regex

zricethezav avatar May 01 '24 16:05 zricethezav

It couldn't hurt given #2354. I think this specific error is caused by the configured max_mem for re2 being smaller than the TruffleHog's maximum diff size.

https://github.com/trufflesecurity/trufflehog/blob/2888f8cdfcb1b70f1814dc223d17d45fc4eebb20/pkg/gitparse/gitparse.go#L27-L28

rgmz avatar May 01 '24 16:05 rgmz

The whole diff is never scanned, we use a sliding-window-with-overlap chunker to break up data into more manageable chunks:

https://github.com/trufflesecurity/trufflehog/blob/333c4f52961bf1d06d04a82fbdea35a796d102db/pkg/sources/chunker.go#L13-L18

Looks like the default max_mem is 8MB, so i'm guessing we have an expensive regex on some data?

dustin-decker avatar May 13 '24 15:05 dustin-decker

Unfortunately, this seems to be a transient error. I've attempted to re-scan orgs/repos where I encountered it but haven't been able to reproduce it (so far).

It might be possible for wasilibs/go-re2 to catch failures from the underlying RE2::Match method and log additional context.

https://github.com/google/re2/blob/b7e96b34c0945fccb8b5282404f82c7ab0843717/re2/re2.cc#L772-L777

rgmz avatar May 14 '24 23:05 rgmz

The whole diff is never scanned, we use a sliding-window-with-overlap chunker to break up data into more manageable chunks:

https://github.com/trufflesecurity/trufflehog/blob/333c4f52961bf1d06d04a82fbdea35a796d102db/pkg/sources/chunker.go#L13-L18

Looks like the default max_mem is 8MB, so i'm guessing we have an expensive regex on some data?

This issue should be resolved with this pull request since the data fed to the regex will not exceed 4kB.

ahrav avatar Jun 08 '24 22:06 ahrav