tide
tide copied to clipboard
Nail down wildcard/fallback rules
The initial implementation of the router uses greedy matching for URL segments. Imagine you have two routes:
-
foo/{}/baz
-
foo/new/bar
The URL foo/new/baz
will fail to be routed, because the new
component will be greedily matched (concrete segments are always preferred over wildcards) and there is no backtracking.
This behavior is the simplest to implement, but it's not clear that it's the most obvious behavior. OTOH, it's not clear that the URL should match here. And in general, this kind of routing situation seems like an anti-pattern, and is perhaps something we should detect and disallow.
Thoughts welcome!
Maybe we can use regex to approach this. We can convert path expression into regex mechanically. Might contain some errors, but here's the rules:
- Prepend
^
and append$
-
{}
→([^/]+)
-
{}*
→(.*)
-
{foo}
→(?P<foo>[^/]+)
-
{foo}*
→(?P<foo>.*)
This way, /foo/{}/baz
becomes ^/foo/([^/]+)/baz$
and /foo/new/bar
becomes ^/foo/new/bar$
. Testing both regexes on /foo/new/baz
, we can easily find out the path matches only the first one.
Bullet points:
- This can be done easily, and resolves most of the counterintuitive cases
-
regex
crate is highly optimized, and we can benefit from that - We should think about detecting duplicated routes
A scenario I'd like us to consider is the way GitHub does routing for users / organizations. Essentially I'd like us to be able to support a similar URL structure.
Route | Example |
---|---|
/:user/:repo |
/yoshuawuyts/category |
/:user/:repo/settings |
/yoshuawuyts/category/settings |
/:user/:repo/tree/:branch |
/yoshuawuyts/category/tree/master |
/:user/:repo/tree/:branch |
/yoshuawuyts/category/tree/master |
/:user/:repo/blob/:branch/* |
/yoshuawuyts/category/blob/master/content/2019-01-13-wasm-2019.md |
/marketplace |
/marketplace |
/marketplace/category/chat |
/marketplace/category/chat |
But ins't the routing structure specific to the way the app is structured? I mean you could have a RESTful API or you could have something arbitrary right?
I can highly recommend route-recognizer
or a similar algorithm, e.g. to draw an almost identical example to what @yoshuawuyts just noted http://git.nemo157.com/grarr/blob/master/src/handler/blob.rs is being routed via the matcher /*repo/blob/:ref/*path
. Having mid-route globs like this with a strongly defined precedence is very useful for some applications.
The problem with using pure regex is deciding the precedence when you have multiple matching routes, route-recognizer
avoids this by having higher precedence for literal matches, then for single-variable segments and finally lowest precedence for globs.
(EDIT: Although, I'm now remembering that it has literally the opposite order to what I mention so I was using a fork that reversed it.)
I didn't know about route-recognizer
! Originally, I thought about a regex-like algorithm for route matching, but discarded that idea as it'd require implementing state machines that is not so easy. Quickly looked through the doc and I feel this is definitely the better approach.
Lookup orders
-
static
-
named parameter if
static
not found -
catch-all parameter if
named parameter
not found
Example: https://github.com/trek-rs/path-tree/blob/master/src/lib.rs#L140-L176 Tests: https://github.com/trek-rs/path-tree/blob/master/tests/basic.rs
We've now moved to route-recognizer! To finish out this issue, though, we need to fix the route selection order in that crate. I'm working on getting ownership of the crate for the rustasync org so we can do that ourselves.
I found my way back to this issue and see it is open but blocked at the moment. I've just created PR #254 which relates to this - updating the documentation to match what is actually implemented through route-recognizer.
Playing around with the options it does lead to some 'interesting' routes that are supported: e.g. you can use nameless matches - they are nameless, but in terms of the parameter map they end up being indexed by an empty string - so you can have:
async fn echo_empty(cx:Context<()>) -> Result<String, tide::Error> {
let nameless: String = cx.param("").client_err()?;
Ok(nameless)
}
// snip
app.at("/echo/:/:path").get(echo_empty)
I don't know if validation can be done of the path - for example maybe to panic if the above syntax is used. The other one that I tried out is a path like /echo/*path/:one
will never be able to match any route as you have a the star which is consuming to the end of the path and then it is expecting more.
Note from triage: we want to survey routers in other languages and document their routing rules to decide how to proceed here.