tide Nail down wildcard/fallback rules

The initial implementation of the router uses greedy matching for URL segments. Imagine you have two routes:

foo/{}/baz
foo/new/bar

The URL foo/new/baz will fail to be routed, because the new component will be greedily matched (concrete segments are always preferred over wildcards) and there is no backtracking.

This behavior is the simplest to implement, but it's not clear that it's the most obvious behavior. OTOH, it's not clear that the URL should match here. And in general, this kind of routing situation seems like an anti-pattern, and is perhaps something we should detect and disallow.

Thoughts welcome!

Nov 08 '18 05:11 aturon

Maybe we can use regex to approach this. We can convert path expression into regex mechanically. Might contain some errors, but here's the rules:

Prepend ^ and append $
{} → ([^/]+)
{}* → (.*)
{foo} → (?P<foo>[^/]+)
{foo}* → (?P<foo>.*)

This way, /foo/{}/baz becomes ^/foo/([^/]+)/baz$ and /foo/new/bar becomes ^/foo/new/bar$. Testing both regexes on /foo/new/baz, we can easily find out the path matches only the first one.

Bullet points:

This can be done easily, and resolves most of the counterintuitive cases
regex crate is highly optimized, and we can benefit from that
We should think about detecting duplicated routes

Feb 22 '19 17:02 tirr-c

A scenario I'd like us to consider is the way GitHub does routing for users / organizations. Essentially I'd like us to be able to support a similar URL structure.

Route	Example
`/:user/:repo`	`/yoshuawuyts/category`
`/:user/:repo/settings`	`/yoshuawuyts/category/settings`
`/:user/:repo/tree/:branch`	`/yoshuawuyts/category/tree/master`
`/:user/:repo/tree/:branch`	`/yoshuawuyts/category/tree/master`
`/:user/:repo/blob/:branch/*`	`/yoshuawuyts/category/blob/master/content/2019-01-13-wasm-2019.md`
`/marketplace`	`/marketplace`
`/marketplace/category/chat`	`/marketplace/category/chat`

Feb 22 '19 17:02 yoshuawuyts

But ins't the routing structure specific to the way the app is structured? I mean you could have a RESTful API or you could have something arbitrary right?

Feb 22 '19 17:02 bIgBV

I can highly recommend route-recognizer or a similar algorithm, e.g. to draw an almost identical example to what @yoshuawuyts just noted http://git.nemo157.com/grarr/blob/master/src/handler/blob.rs is being routed via the matcher /*repo/blob/:ref/*path. Having mid-route globs like this with a strongly defined precedence is very useful for some applications.

The problem with using pure regex is deciding the precedence when you have multiple matching routes, route-recognizer avoids this by having higher precedence for literal matches, then for single-variable segments and finally lowest precedence for globs.

(EDIT: Although, I'm now remembering that it has literally the opposite order to what I mention so I was using a fork that reversed it.)

Feb 22 '19 17:02 Nemo157

I didn't know about route-recognizer! Originally, I thought about a regex-like algorithm for route matching, but discarded that idea as it'd require implementing state machines that is not so easy. Quickly looked through the doc and I feel this is definitely the better approach.

Feb 22 '19 17:02 tirr-c

Lookup orders

static
named parameter if static not found
catch-all parameter if named parameter not found

Example: https://github.com/trek-rs/path-tree/blob/master/src/lib.rs#L140-L176 Tests: https://github.com/trek-rs/path-tree/blob/master/tests/basic.rs

Mar 11 '19 09:03 fundon

We've now moved to route-recognizer! To finish out this issue, though, we need to fix the route selection order in that crate. I'm working on getting ownership of the crate for the rustasync org so we can do that ourselves.

Apr 10 '19 20:04 aturon

I found my way back to this issue and see it is open but blocked at the moment. I've just created PR #254 which relates to this - updating the documentation to match what is actually implemented through route-recognizer.

Playing around with the options it does lead to some 'interesting' routes that are supported: e.g. you can use nameless matches - they are nameless, but in terms of the parameter map they end up being indexed by an empty string - so you can have:

async fn echo_empty(cx:Context<()>) -> Result<String, tide::Error> {
    let nameless: String = cx.param("").client_err()?;
    Ok(nameless)
}
// snip
app.at("/echo/:/:path").get(echo_empty)

I don't know if validation can be done of the path - for example maybe to panic if the above syntax is used. The other one that I tried out is a path like /echo/*path/:one will never be able to match any route as you have a the star which is consuming to the end of the path and then it is expecting more.

May 21 '19 16:05 gameldar

Note from triage: we want to survey routers in other languages and document their routing rules to decide how to proceed here.

Nov 12 '20 16:11 yoshuawuyts

tide tide copied to clipboard

Nail down wildcard/fallback rules

tide
tide copied to clipboard