Parameter customization
In versions < 8 it was possible to customize the parameter match using a regex. For example, :param(a|b|c). While this is immensely helpful for end users, it created a security issue if the user manages to generate a vulnerable regex. This is incredibly hard for users to avoid because the individual parameters may not be vulnerable to ReDoS but when combined together it generates something unsafe.
This issue is to bring back the same parameter match override (the part after :param between parens) but doesn't re-introduce a vulnerability. Two approaches I can think of are:
- Whitelist a subset of regex features that can be safely combined
- Add a DSL that translates into a safe regex
I think the best path forward is likely a regex-like DSL, given that regex itself can be different depending on the flags used, and there's no control over the flags in this library (e.g. unicode support). For majority of people, it would just be regex anyway.
An additional benefit of having a little more control over the customization instead of directly passing through a regex is also support for encodePath. In older versions, there was a disconnect because the path itself was encoded for matches but the regex was not, so something trivial like /^ and /:param(^) resulted in different regexes.
I don't mind contributing this, what should the DSL support, and what constitutes a safe/unsafe regex?
K
what should the DSL support
I think the safe place to start would be standard regex features via whitelist. We could start with just text and no regex, and |, then add more features such as \d, \w, etc. Once we get into repeated characters is where the unsafe regex start to matter.
what constitutes a safe/unsafe regex
Whether it could cause backtracking. Mostly it's a question of whether this regex could mix with the previously generated text to create a catastrophic backtracking situation.
I have a work-in-progress here.
I think the allowed regex features make sense:
- Only non-capturing groups allowed with
(?:, no nested groups, no lookahead, lookbehind, etc to worry about. - Character classes allowed
- Quantifiers allowed
- No backreferences
- No anchors
I'm just a bit confused about 2 things:
- How to add the parameter regex back into the main regex? (I currently just put it in without any checks)
- How to deal with delimiters? For example
/user/:id([^x]+)would gobble up/user/12345/related, I'm not sure if that's something we're supposed to prevent somehow?
Any feedback appreciated :)