hyper-router icon indicating copy to clipboard operation
hyper-router copied to clipboard

Would you be open to a feature to allow passing regex captures onto handlers

Open 8176135 opened this issue 6 years ago • 14 comments

I think it would be pretty nice to be able to do regex captures and have the arguments passed onto the handler. This would need to be on a separate struct from Route since captures cost more performance than just matches. E.g.

RouterBuilder::new()
        .add(RouteWithCaptures::get("/hello/(?P<name>[^/]*)/?").using(request_handler))
...
fn request_handler(a: Request<Body>, args: HashMap<String,String>) {
...
}

This could also be a stepping stone for the

Todo: add the ability to distinguish requests by query parameters.

Would you be open to a PR implementing this?

8176135 avatar Jul 17 '19 06:07 8176135

Well actually I just tried to do it, while the concept is not difficult (just collect the regex captures into a hashmap, then pass it into the handler), unless there is some generic trick I can't think of, it requires duplicating a lot of code if we want to have it in a separate type.

8176135 avatar Jul 17 '19 07:07 8176135

Sure I'd be open to this contribution :) In fact I planned doing this long time ago, but I didn't have enough Rust skills back then, and now my interests shifted, so... yeah PR would be great! Thanks!

marad avatar Jul 23 '19 06:07 marad

Hi all, Is there a PR currently being developed? I decided to take a crack at it, and have something basic working. It's not very great, very basic, but gets the job done, and it's roughly at the level that you would expect a hyper router to be, eg. it doesn't try to do too many clever things with types, parsing, validation, etc, letting the user of the library handle that on their own.

fn greeting_for_name (input: Vec<String>) -> String {
    return if input[1].len() == 0 {
        "Hello, stranger!".to_string()
    } else {
        format!("Hello, {}!", input[1])
    }
}

fn greet_handler(_: Request<Body>, input: Vec<String>) -> Response<Body> {
    let greeting = greeting_for_name(input);
    Response::builder()
        .header(CONTENT_LENGTH, greeting.len() as u64)
        .header(CONTENT_TYPE, "text/plain")
        .body(Body::from(greeting))
        .expect("Failed to construct the response")
}

fn router_service() -> Result<RouterService, std::io::Error> {
    let router = RouterBuilder::new()
        .add(Route::get("/hello").using(request_handler))
        .add(Route::from(Method::PATCH, "/world").using(request_handler))
        .add(Route::get("/greet/to(.*)").using(greet_handler))
        .build();

    Ok(RouterService::new(router))
}

https://github.com/akatechis/hyper-router/commit/f33abf13671575f6d014b0cfb56356405cf548d3

akatechis avatar Jul 25 '19 19:07 akatechis

@akatechis I have been working a PR, but I haven't managed to get code duplication down to a point I like yet.

The problem as I mentioned above is that I didn't want to slot in the captures into the current structs, since parsing captures aren't free, especially if we are going to collect all the captures into a new Vec/Hashmap. Though I'm not sure if I am preoptimizing here.

8176135 avatar Jul 26 '19 01:07 8176135

@akatechis Thanks for also looking into that!

@akatechis @8176135 Do you guys have your codes somwhere, so I can look at them? Maybe I could suggest something, or maybe we could work something out together?

marad avatar Jul 26 '19 06:07 marad

My current commit is here: 8176135@51db765ecabc3e0bee7c43e8ec10c1bc1f17986a

I tried to move as much stuff into traits as possible, so that there is minimal code duplication between the structs, not sure if this is the correct method though.

Still working on getting the router working for both, feel like I have to duplicate and trait that too.

Though if someone didn't put any captures into their regex, I'm not sure how much the performance is going to decrease if we just look for it anyway, so I might be over optimizing here. (Actually I am going to benchmark that right now)

8176135 avatar Jul 26 '19 07:07 8176135

After some crude benchmarking, it seems like capture then creating a Vec runs about 6 times slower than is_match, even when there isn't a capture group in the regex. (without creating a Vec is about 3 times slower, but we have to create the Vec or Hashmap anyway)

Test I used:

fn main() {
	let regex = regex::Regex::new("https://a.*b[^assw]+").unwrap();
	for _ in 0..10000 {
		let a = regex.is_match("https://");
		let a = regex.is_match("https://asdsadwdwb");
		let a = regex.is_match("https://asdsadwdwbs");
		let a = regex.is_match("https://asdsadwdwbddd");
		let a = regex.is_match("https://asdsadwdwbddds");
		let b = regex.is_match("bttps://a");
	}

	let start = std::time::Instant::now();

	for _ in 0..10000 {
		let a = regex.captures("https://").map(|c| c.iter().map(|c| c.unwrap().as_str().to_owned()).collect::<Vec<String>>()).unwrap_or(Vec::new());
		let a = regex.captures("https://asdsadwdwb").map(|c| c.iter().map(|c| c.unwrap().as_str().to_owned()).collect::<Vec<String>>()).unwrap_or(Vec::new());
		let a = regex.captures("https://asdsadwdwbs").map(|c| c.iter().map(|c| c.unwrap().as_str().to_owned()).collect::<Vec<String>>()).unwrap_or(Vec::new());
		let a = regex.captures("https://asdsadwdwbddd").map(|c| c.iter().map(|c| c.unwrap().as_str().to_owned()).collect::<Vec<String>>()).unwrap_or(Vec::new());
		let a = regex.captures("https://asdsadwdwbddds").map(|c| c.iter().map(|c| c.unwrap().as_str().to_owned()).collect::<Vec<String>>()).unwrap_or(Vec::new());
		let b = regex.captures("bttps://a").map(|c| c.iter().map(|c| c.unwrap().as_str().to_owned()).collect::<Vec<String>>()).unwrap_or(Vec::new());
	}

	dbg!(std::time::Instant::now().duration_since(start));


	let start = std::time::Instant::now();

	for _ in 0..10000 {
		let a = regex.is_match("https://");
		let a = regex.is_match("https://asdsadwdwb");
		let a = regex.is_match("https://asdsadwdwbs");
		let a = regex.is_match("https://asdsadwdwbddd");
		let a = regex.is_match("https://asdsadwdwbddds");
		let b = regex.is_match("bttps://a");
	}

	dbg!(std::time::Instant::now().duration_since(start));
}

8176135 avatar Jul 26 '19 08:07 8176135

Hmm, looking at this I started wondering if regex paths is what hyper router needs. What is the basic need that we are trying to fulfill?

I suspect that the most important thing is to be able to capture some parts of the URL. Maybe it would be faster to create a custom parser for URLs like this:

/orders/{id}/status

I imagine this would be significantly faster since it does not require regex matching, and I think it would address most needs regarding URLs. Also would be easier to use since... there would be no RegExes :)

What are some real-life use cases for using the regex power to match URLs?

marad avatar Jul 26 '19 13:07 marad

I dont think there's much of a use case for regex specifically, I think I was just using regex because it was the shortest path to get param capture working. Alternatives are node.js libraries like express/connect which use /orders/:id/status and python's flask which uses /orders/<id>/status.

akatechis avatar Jul 26 '19 14:07 akatechis

I hope this isn't too off-topic, but I sat down to try out some different kinds of APIs to describe router params and I was wondering if the following might be even easier to use. It's closer to a RESTful interface, so I don't know if the concept of a Resource might be a good abstraction:

fn router_service() -> Result<RouterService, std::io::Error> {
    let router = RouterBuilder::new()
        .add(Resource::new("/user/:id")
            .get(request_handler)
            .post(request_handler))
        .build();

    Ok(RouterService::new(router))
}

Internally, the Resource struct could have Option<Handler> for each of the methods, rather than a Vec, so we can avoid the extra iteration through the list of handlers for each method.

akatechis avatar Jul 26 '19 15:07 akatechis

@akatechis I'd rather not use REST nomenclature since there is nothing RESTful about the hyper-router and I like it that way.

I think that this is outside of the scope of this ticket. Changing the basic API that everyone uses should be considered very carefully.

As for the exact form of the path variables the most natural for me is with using {id} since I come from Java world. But I think the express way (:id)is even simpler to parse.

marad avatar Jul 27 '19 09:07 marad

Agreed on both issues. I'll try preparing something sometime today :)

akatechis avatar Jul 27 '19 12:07 akatechis

I've opened PR #20 for this.

akatechis avatar Jul 30 '19 00:07 akatechis

Thanks @akatechis! I'll take a look at this PR during the week or even weekend (depends on workload)

marad avatar Jul 30 '19 05:07 marad