Dancer2 icon indicating copy to clipboard operation
Dancer2 copied to clipboard

Regex metachars have special meaning in string routes

Open pdl opened this issue 8 years ago • 1 comments

I am surprised to find that the following are equivalent:

get  q{/foo/(\<id>\d+)} => sub{...};
get qr{/foo/(\<id>\d+)} => sub{...};

Fortunately, it looks like . does not have it's regular expression meaning so e.g. 'foo.json' won't match fooljson.

However, there may be other metacharacters that cause problems. Ordinary brackets are legitimate in urls, so:

get q{/a(b)} => sub{...}; # does not match /a(b)
get q{/a\(b\)} => sub{...}; # matches /a(b)

While brackets are rare in urls, this might be surprising for people who have arbitrary text in their routes, e.g. see http://advent.perldancer.org/2014/23 - which suggests at one point interpolating values from a db into the route.

Is this an intentional feature?

If so, is it documented anywhere?

The closest thing I can find in the docs is the regex-in-string-form matching the user agent in the example at https://metacpan.org/pod/distribution/Dancer2/lib/Dancer2/Manual.pod#Conditional-Matching - and the rules for this are not explained.

pdl avatar Oct 14 '16 16:10 pdl

[...] Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "", "^", "~", "[", "]", and "`".

All unsafe characters must always be encoded within a URL. [...]

-- RFC 1738 2.2: URL Character Encoding Issues.

This means that while we should allow ( and ), this does not extend to all characters used in regular expressions.

If we're currently not quoting strings correctly, we should write a test to cover how we think it should look like and fix it.

xsawyerx avatar Oct 15 '16 18:10 xsawyerx