Improve Router Functionality
There are a few things we should do to improve the functionality of the current router. Each of these should be a smaller PR.
Chunk the REGEX
The current implementation has a bug that will crash the application in the event that the expression is too long. We can solve this by breaking it down into smaller chunks.
Group Prefixes
We can improve the performance of the expression by grouping by route prefix.
Route Caching
We may (or may not) benefit from route caching. We should run some benchmarks and do some tests.
Edit: I was mistaken and we are caching (see #211). We should just dust this off and make sure everything is still looking good here.
Resolve Params REGEX
In #181, @brendt noted:
I think we can refactor resolveParams to either not use regex matches, or a more simplified version; since we're now sure we already have a match
I'm not sure if this is still possible with the latest PR #486, but we should take a look.
Abstract parts of route matching
In #181, @brendt noted:
Add a RouteMatcher class that matches a Route based on a Request and cleans up parts of the Router
We should revisit this. I think it is worth looking to the structure of FastRoute for some inspiration.
Cleanup Custom REGEX
We need to make some design decisions about custom REGEX and implement accordingly. Specifically:
- Should we support spaces around parameter names/REGEX (e.g.,
/users/{ username: \w }). - How do we want to handle weird scenarios like:
/{lang:(en|de)}or/foo/{test}/{test:\d+}?
CC: @vsergiu93, @N-Silbernagel
I am curious to implement Group Prefixes. I also think I will be able to build Chunking into it, even without the 'hack' the symfony maintainers used.
The idea I am currently on is to build up a tree of paths and when we are done with discovery I can easily generate an optimized regex from it.
It is the same approach I used some time ago with a custom extensible routing engine in some old node framework. I wrote about it here if anybody is interested. The big difference in this case is that we do not use said tree for route matching, but produce an optimized regex (or multiple because of chunking) .
For now I will leave the other sections as is but will keep it in mind when implementing. Maybe they are easy to include, but I rather submit multiple smaller PRs than one big one ;)
@blackshadev please feel free to take a look! I'm not entirely understanding where you are coming from with that article and referencing Symfony's solution as a hack, so just a few guidelines:
-
Generally, I think we want to maintain the compiled expression (just chunked if it's too large). It's the fastest route matching option.
-
Let's benchmark any changes with
tempest-benchmark.
Thanks!
Small update
I am hacking away in this branch for the curious minded. It is still very WIP, in my eyes even to soon for a draft PR. Do say so if you disagree.
What I did do
- Using the route discovery build up a routing tree
- Use said to to build a matching regex per method with group prefixes
- Extract parameters out using the matching regex instead of separate match
Example:
I got an example app running with these routes:
/
/person
/person/{id}
/person/{id}/name
/person/{id}/id
/person/id/{id}
which will produce the following matching regex
#(?|/person(?|/id(?|/([^/]++)(*MARK:e))|/([^/]++)(?|/name(*MARK:c)|/id(*MARK:d)|(*MARK:b))))#
The first two routes are removed because they are static routes without any parameters, this was already an optimization in place. All other routes should be in there.
What I am missing
- [ ] Tests: any tests.
- [ ] A bit more thought regarding code design:
- [ ] I bolted my solution on top of the RouteConfig, but I think the RouteConfig en GenericRouter would benefit from some code splitting for separate usecases.
- [ ] Having a the routing tree build separate from where the regex is build and chunked would be nice
- [ ] To use recursion or not? If nobody has strong opinions I will weigh benchmarking and code-readability against each other.
- [ ] Handling some edge cases better
- [ ] like overwriting routes, I did not check how the old code handled these, but the outcome should be the same)
- [ ] Adding multiple matching regexes on the same path (so 2 paths one
/{id: \d+}the other/{id}
- [ ] Benchmarking
- [ ] Validate if the RouteConfig cache still functions (although it should, I didn't change that much)
- [ ] Handling chucking when the regex becomes to large
Update from my side.
I implemented a feature complete version of the routing tree, but yet entirely optimized and cleaned. But I wanted to get a rough benchmark first.
In summary
I will post the raw results below. My conclusion is as follows: this whole route prefixing isn't really that much faster. For existing routes it is a bit faster, but for non-existing routes it clearly isn't. That said, I also implemented some other optimizations, like not regex matching the route parameters again after the route has been matched and not extracting the route parameters in the match, but during discovery instead. I did expect there to be more of a difference, but in the end I can only conclude that PHP itself has been optimized to such a degree that it is hard to draw any hard conclusions from this testing methodology,
I do think it is worth the effort to continue. But it will not have such a severe impact on most applications. I do think my algorithm and code will be easier to extent later on with new routing features. I will try to optimize my code more and specifically the discovery part (which I didn't put any effort in right now). I will also clean up my changes and provide a more granular benchmark for the router itself.
Hopefully this proves some insights. If anybody (@brendt and @aidan-casey specifically) know none of this will make it into the framework, please do say so before I invest allot more effort into finalizing it.
In depth
Testing setup
I used the tempest-benchmark repo as a base, although it is a bit barebone. I altered their controller templates to actually work (return Ok and add types to the parameters). I ran setup.php with 100 as a parameter resulting in roughly 100 * 4 * 6 routes. Some of these routes are nested, but most aren't. this means this is not the best case scenario for the group prefixing algorithm, but I do think it is an "ok" representation of more realworld examples. A best case routing setup would be really deeply nested, but that isn't al that common in the real world.
I used a small docker setup to run the server, I noticed with the build in server php tempest serve to much of a difference per run, which made my results really inconclusive. I was also considerably slower on all fronts. I attached the docker files for anybody to use. It is a small and copied together setup, but it does work wonders for having more consistent results.
Testing method
I tested against /posts/controller_001/5 as a positive match and http://localhost:8081/posts/controller_001/5/nonexisting/x for a negative (non existing) one. The non existing route is the worst case route for the grouping prefix regex as it matches up until the 5 before finding out this isn't a match and it need to backtrack through all nested groups. The positive one is also not "the best" possible route for both algorithms, since both route matching regexes for main and the new branch aren't alphabethically ordered and thus could be anywhere in the regex.
Files
Raw results
Docker files
After my initial "real life" benchmarks I wanted a more fine grained benchmark with more precise methods.
With use of phpbench I managed do get the following results:
My branch
+--------------------+------------+--------------+------+-----+----------+----------+--------+
| benchmark | subject | set | revs | its | mem_peak | mode | rstdev |
+--------------------+------------+--------------+------+-----+----------+----------+--------+
| GenericRouterBench | benchMatch | Dynamic | 100 | 5 | 10.609mb | 47.802μs | ±0.86% |
| GenericRouterBench | benchMatch | Non existing | 100 | 5 | 10.610mb | 47.031μs | ±1.33% |
| GenericRouterBench | benchMatch | Static | 100 | 5 | 10.609mb | 46.981μs | ±1.18% |
+--------------------+------------+--------------+------+-----+----------+----------+--------+
Main branch
+--------------------+------------------+--------------+------+-----+----------+-----------+--------+
| benchmark | subject | set | revs | its | mem_peak | mode | rstdev |
+--------------------+------------------+--------------+------+-----+----------+-----------+--------+
| GenericRouterBench | benchMatch | Dynamic | 100 | 5 | 10.578mb | 60.639μs | ±5.14% |
| GenericRouterBench | benchMatch | Non existing | 100 | 5 | 10.578mb | 59.391μs | ±0.39% |
| GenericRouterBench | benchMatch | Static | 100 | 5 | 10.578mb | 60.423μs | ±1.49% |
+--------------------+------------------+--------------+------+-----+----------+-----------+--------+
Meaning my code is noticeable faster, and more stable on the positive match. It is less stable (but faster) on the negative match. Still this is such a small difference that the main advantage will be: throughput and a better surface to extend later on.
In my code I will also be including this benchmark tool. I think it will be a really good resource to use when optimizing hot code paths like these.
Benchmark
Looks promising! @aidan-casey what are your thoughts?