gitleaks icon indicating copy to clipboard operation
gitleaks copied to clipboard

Catch common pattern errors with `\b`

Open rgmz opened this issue 1 year ago • 4 comments

Description:

The intention of this change is to detect common pattern errors (specifically when using \b).

For example, given the problematic pattern glpat-[\w-]{20}\b, the following error will be produced:

$ go generate ./...
{"level":"fatal","error":"regex contains word boundary (\\b) beside non-word character (i.e., not [A-Za-z0-9_]): glpat-[\\w-]{20}\\b","rule-id":"gitlab-pat","time":"2024-09-09T13:34:24-04:00","message":"Failed to validate rule"}
exit status 1
cmd/generate/config/main.go:17: running "go": exit status 1

I've discussed why this is an issue in a few different places, most recently: https://github.com/leaktk/patterns/pull/41#discussion_r1723758978

Open Questions

  • [ ] Should something permission like (?:.|\s){0,200} count as a non-word character? I am leaning towards yes because it's a lack of specificity, but am stuck.

    The Kubernetes pattern fails if "anything" is considered a non-word character. https://github.com/gitleaks/gitleaks/blob/e93a7c0d2604fd1bcc43ac9cac6144a62387a8a4/cmd/generate/config/rules/kubernetes.go#L48

  • [ ] Should this fail or simply log a warning?

  • [ ] Check for two word characters together (a\bc) which is also invalid

  • [ ] Always populate before/after for better debugging

Checklist:

  • [x] Does your PR pass tests?
  • [x] Have you written new tests for your changes?
  • [x] Have you lint your code locally prior to submission?

rgmz avatar Sep 09 '24 17:09 rgmz

This PR isn't ready to merge but does need some feedback.

rgmz avatar Sep 12 '24 22:09 rgmz

Should this fail or simply log a warning?

I think this is safe to fail since it only gets called during the generation of the config and not during scans.

Should something permission like (?:.|\s){0,200} count as a non-word character? I am leaning towards yes because it's a lack of specificity, but am stuck.

That's a tough one, I'm not sure either.

Overall I like this change since it enhances already existing guardrails during the config generation.

zricethezav avatar Sep 28 '24 00:09 zricethezav

@bplaxco Some interesting results after testing against leaktk/patterns:

8:45AM FTL Failed to load config error="regex contains word boundary (\\b) '' non-word character (i.e., not [A-Za-z0-9_]): before=-, after=, segment=(?i:\\bXOXB-[0-9]{10,13}-[0-9]{10,13}[\\-0-9A-Za-zſK]*\\b)"

8:46AM FTL Failed to load config error="regex contains word boundary (\\b) '' non-word character (i.e., not [A-Za-z0-9_]): before=-, after=, segment=\\bglpat-[\\-0-9A-Z_a-z]{20}\\b"

8:47AM FTL Failed to load config error="regex contains word boundary (\\b) '' non-word character (i.e., not [A-Za-z0-9_]): before=-, after=, segment=\\b(?:glrt-|GR1348941)[\\-0-9A-Z_a-z]{20}\\b"

8:47AM FTL Failed to load config error="regex contains word boundary (\\b) '' non-word character (i.e., not [A-Za-z0-9_]): before=-, after=, segment=(?i:\\bXOX[EPep](?:-[0-9]{10,13}){3}-[\\-0-9A-Za-zſK]{28,34}\\b)"

8:48AM FTL Failed to load config error="regex contains word boundary (\\b) '' non-word character (i.e., not [A-Za-z0-9_]): before=\x00-\b, after=, segment=(?-s:\\boc[\\t\\n\\f\\r ]+login[\\t\\n\\f\\r ]+.*?--token[\\t\\n\\f\\r ]*=?[\\t\\n\\f\\r ]*(sha256~[^\\t\\n\\f\\r <]{16,})\\b)"

8:50AM FTL Failed to load config error="regex contains word boundary (\\b) '' non-word character (i.e., not [A-Za-z0-9_]): before=-, after=, segment=\\bSG\\.[\\-0-9A-Z_a-z]{16,32}\\.[\\-0-9A-Z_a-z]{16,64}\\b"

8:50AM FTL Failed to load config error="regex contains word boundary (\\b) '' non-word character (i.e., not [A-Za-z0-9_]): before=, after=, segment=(?i:\\bXOXE-[0-9]-[0-9A-Za-zſK]{146}\\bJ)"

8:58AM FTL Failed to load config error="regex contains word boundary (\\b) '' non-word character (i.e., not [A-Za-z0-9_]): before=+, after=, segment=(?i:AWS[\\-0-9A-Z_a-zſK]{0,32}[\"']?[\\t\\n\\f\\r ]*?[\\(:=][\\t\\n\\f\\r ]*?[\"']?([\\+/-9A-Za-zſK]{40})\\b)"

(The output could be a lot clearer/more friendly. Basically, /b is in an invalid position, potentially missing results.)

rgmz avatar Apr 14 '25 12:04 rgmz

@bplaxco Some interesting results after testing against leaktk/patterns:

Thanks for pointing those out :ok_hand: I'll update the patterns and my pattern compiler to check for that ^_^

bplaxco avatar May 04 '25 19:05 bplaxco