Dancer2
Dancer2 copied to clipboard
Regex metachars have special meaning in string routes
I am surprised to find that the following are equivalent:
get q{/foo/(\<id>\d+)} => sub{...};
get qr{/foo/(\<id>\d+)} => sub{...};
Fortunately, it looks like .
does not have it's regular expression meaning so e.g. 'foo.json'
won't match fooljson
.
However, there may be other metacharacters that cause problems. Ordinary brackets are legitimate in urls, so:
get q{/a(b)} => sub{...}; # does not match /a(b)
get q{/a\(b\)} => sub{...}; # matches /a(b)
While brackets are rare in urls, this might be surprising for people who have arbitrary text in their routes, e.g. see http://advent.perldancer.org/2014/23 - which suggests at one point interpolating values from a db into the route.
Is this an intentional feature?
If so, is it documented anywhere?
The closest thing I can find in the docs is the regex-in-string-form matching the user agent in the example at https://metacpan.org/pod/distribution/Dancer2/lib/Dancer2/Manual.pod#Conditional-Matching - and the rules for this are not explained.
[...] Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "", "^", "~", "[", "]", and "`".
All unsafe characters must always be encoded within a URL. [...]
-- RFC 1738 2.2: URL Character Encoding Issues.
This means that while we should allow (
and )
, this does not extend to all characters used in regular expressions.
If we're currently not quoting strings correctly, we should write a test to cover how we think it should look like and fix it.