POC for hyperscan usage in UrlDispatcher
https://pypi.org/project/hyperscan/ could speed up url dispatcher if available.
Fallback to the current implementation is supported if hyperscan compilation fails or not available.
CodSpeed Performance Report
Merging #9907 will degrade performances by 26.95%
Comparing hyperscan (a87dc32) with master (e79b2d5)
Summary
⚡ 3 improvements
❌ 3 regressions
✅ 37 untouched benchmarks
:warning: Please fix the performance issues or acknowledge them on CodSpeed.
Benchmarks breakdown
| Benchmark | BASE |
HEAD |
Change | |
|---|---|---|---|---|
| ❌ | test_resolve_dynamic_resource_url_with_many_dynamic_routes[pyloop] |
3.7 ms | 5.1 ms | -26.95% |
| ⚡ | test_resolve_dynamic_resource_url_with_many_dynamic_routes_with_common_prefix[pyloop] |
260.6 ms | 4.9 ms | ×53 |
| ❌ | test_resolve_dynamic_resource_url_with_many_static_routes[pyloop] |
3.7 ms | 5 ms | -25.34% |
| ⚡ | test_resolve_gitapi[pyloop] |
316 ms | 5.8 ms | ×55 |
| ⚡ | test_resolve_gitapi_subapps[pyloop] |
321 ms | 7.9 ms | ×41 |
| ❌ | test_resolve_static_root_route[pyloop] |
1.1 ms | 1.2 ms | -10.84% |
Managed to finish the benchmarks before boarding the flight. They should run if you update with master.
Didn't have enough time to dig through downstream or write the tests yet though
Thank you!
Codecov Report
Attention: Patch coverage is 82.55814% with 15 lines in your changes missing coverage. Please review.
Project coverage is 98.69%. Comparing base (
e79b2d5) to head (a87dc32).
:white_check_mark: All tests successful. No failed tests found.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| aiohttp/web_urldispatcher.py | 82.55% | 8 Missing and 7 partials :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## master #9907 +/- ##
==========================================
- Coverage 98.73% 98.69% -0.05%
==========================================
Files 121 121
Lines 36747 36825 +78
Branches 4387 4404 +17
==========================================
+ Hits 36281 36343 +62
- Misses 314 323 +9
- Partials 152 159 +7
| Flag | Coverage Δ | |
|---|---|---|
| CI-GHA | 98.57% <81.39%> (-0.05%) |
:arrow_down: |
| OS-Linux | 98.26% <81.39%> (-0.05%) |
:arrow_down: |
| OS-Windows | 96.06% <66.27%> (-0.08%) |
:arrow_down: |
| OS-macOS | 97.36% <79.06%> (-0.04%) |
:arrow_down: |
| Py-3.10.11 | 97.22% <81.39%> (-0.03%) |
:arrow_down: |
| Py-3.10.15 | 97.75% <79.06%> (-0.10%) |
:arrow_down: |
| Py-3.11.10 | 97.78% <79.06%> (-0.06%) |
:arrow_down: |
| Py-3.11.9 | 97.26% <81.39%> (-0.05%) |
:arrow_down: |
| Py-3.12.7 | 98.32% <81.39%> (-0.05%) |
:arrow_down: |
| Py-3.13.0 | 98.31% <81.39%> (-0.06%) |
:arrow_down: |
| Py-3.9.13 | 97.13% <81.17%> (-0.05%) |
:arrow_down: |
| Py-3.9.20 | 97.67% <78.82%> (-0.05%) |
:arrow_down: |
| Py-pypy7.3.16 | 97.25% <65.88%> (-0.08%) |
:arrow_down: |
| VM-macos | 97.36% <79.06%> (-0.04%) |
:arrow_down: |
| VM-ubuntu | 98.26% <81.39%> (-0.05%) |
:arrow_down: |
| VM-windows | 96.06% <66.27%> (-0.08%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Seems to be a lot slower in the benchmarks. I suspect that's due to our previous refactoring, so in the typical case we're doing a couple of dictionary lookups, which is probably a lot faster than hyperscan. Maybe if there's a hyperscan under each dictionary lookup, that might be an improvement?
i.e. Instead of replacing the entire loop, maybe it could replace the await candidate.resolve(request) line in certain circumstances (ideally only when there are regex patterns actually involved).
Maybe if there are lot of DynamicResources in a list because they have the same anchor point it would be a lot faster for that case
Maybe if there are lot of DynamicResources in a list because they have the same anchor point it would be a lot faster for that case
Yeah, that's why I mean when I say it might be good if we can use it only when we get something from the dict, and it has (multiple) regexes.
I think that regexes are pretty common in the real life, that's why https://github.com/aio-libs/aiohttp/pull/9925 was added.
I agree that for constant path dict lookup could be faster.
As we now prefer constant matches over regexes, maybe a {path: resource} dict for plain and prefixed resources + hyperscan for regexes could help.
Let's measure it first.
I think that regexes are pretty common in the real life, that's why #9925 was added.
Yeah, but also many smaller apps, benchmarks etc. don't use them. I'm hoping with the right preprocessing we can optimise for both use cases.
As we now prefer constant matches over regexes
Also, it's not just static paths, but paths that end with regexes will be faster.
If the path is /foo/bar/{regex}, then this will likely be 2 dict lookups and a single regex match, so is very likely to still beat hyperscan.
Where hyperscan will likely improve performance is URLs like /org/{regex}/this/that/other, in which case this might result in several regex matches with the current code.
That's a massive improvement on the gitapi benchmark, so we'll definitely need to figure out some way to integrate this!
Just realized Hyperscan likely doesn't work on armv7l so maybe we need benchmarks for with Hyperscan not available as well. Maybe not though since armv7l is now slowly disappearing from the landscape now that the RPi shortage is over
Just realized Hyperscan likely doesn't work on armv7l so maybe we need benchmarks for with Hyperscan not available as well. Maybe not though since armv7l is now slowly disappearing from the landscape now that the RPi shortage is over
Perhaps parametrized benchmarks will help to measure both hyperscan and domestic versions. Also I'm planing to use parametrization for existing url_dispatcher tests (and maybe web_functional ones) for tests coverage and correctness checking.
From the flame graph on cod speed it looks like test_resolve_gitapi_subapps is producing system routes (maybe 404s)
Good finding, thanks! I'll modify benchmarks to check the resolved route to prevent this situation from becoming possible.
I guess an assertion will not hurt or significantly change timings.
Just realized Hyperscan likely doesn't work on armv7l so maybe we need benchmarks for with Hyperscan not available as well. Maybe not though since armv7l is now slowly disappearing from the landscape now that the RPi shortage is over
Perhaps parametrized benchmarks will help to measure both hyperscan and domestic versions. Also I'm planing to use parametrization for existing url_dispatcher tests (and maybe web_functional ones) for tests coverage and correctness checking.
Opened https://github.com/darvid/python-hyperscan/issues/170