aiohttp POC for hyperscan usage in UrlDispatcher

https://pypi.org/project/hyperscan/ could speed up url dispatcher if available.

Fallback to the current implementation is supported if hyperscan compilation fails or not available.

Nov 15 '24 12:11 asvetlov

CodSpeed Performance Report

Merging #9907 will degrade performances by 26.95%

_{Comparing hyperscan (a87dc32) with master (e79b2d5)}

Summary

⚡ 3 improvements
❌ 3 regressions
✅ 37 untouched benchmarks

:warning: Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
❌	`test_resolve_dynamic_resource_url_with_many_dynamic_routes[pyloop]`	3.7 ms	5.1 ms	-26.95%
⚡	`test_resolve_dynamic_resource_url_with_many_dynamic_routes_with_common_prefix[pyloop]`	260.6 ms	4.9 ms	×53
❌	`test_resolve_dynamic_resource_url_with_many_static_routes[pyloop]`	3.7 ms	5 ms	-25.34%
⚡	`test_resolve_gitapi[pyloop]`	316 ms	5.8 ms	×55
⚡	`test_resolve_gitapi_subapps[pyloop]`	321 ms	7.9 ms	×41
❌	`test_resolve_static_root_route[pyloop]`	1.1 ms	1.2 ms	-10.84%

Nov 15 '24 12:11 codspeed-hq[bot]

Managed to finish the benchmarks before boarding the flight. They should run if you update with master.

Didn't have enough time to dig through downstream or write the tests yet though

Nov 15 '24 17:11 bdraco

Thank you!

Nov 15 '24 17:11 asvetlov

Codecov Report

Attention: Patch coverage is 82.55814% with 15 lines in your changes missing coverage. Please review.

Project coverage is 98.69%. Comparing base (e79b2d5) to head (a87dc32).

:white_check_mark: All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
aiohttp/web_urldispatcher.py	82.55%	8 Missing and 7 partials :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #9907      +/-   ##
==========================================
- Coverage   98.73%   98.69%   -0.05%     
==========================================
  Files         121      121              
  Lines       36747    36825      +78     
  Branches     4387     4404      +17     
==========================================
+ Hits        36281    36343      +62     
- Misses        314      323       +9     
- Partials      152      159       +7

Flag	Coverage Δ
CI-GHA	`98.57% <81.39%> (-0.05%)`	:arrow_down:
OS-Linux	`98.26% <81.39%> (-0.05%)`	:arrow_down:
OS-Windows	`96.06% <66.27%> (-0.08%)`	:arrow_down:
OS-macOS	`97.36% <79.06%> (-0.04%)`	:arrow_down:
Py-3.10.11	`97.22% <81.39%> (-0.03%)`	:arrow_down:
Py-3.10.15	`97.75% <79.06%> (-0.10%)`	:arrow_down:
Py-3.11.10	`97.78% <79.06%> (-0.06%)`	:arrow_down:
Py-3.11.9	`97.26% <81.39%> (-0.05%)`	:arrow_down:
Py-3.12.7	`98.32% <81.39%> (-0.05%)`	:arrow_down:
Py-3.13.0	`98.31% <81.39%> (-0.06%)`	:arrow_down:
Py-3.9.13	`97.13% <81.17%> (-0.05%)`	:arrow_down:
Py-3.9.20	`97.67% <78.82%> (-0.05%)`	:arrow_down:
Py-pypy7.3.16	`97.25% <65.88%> (-0.08%)`	:arrow_down:
VM-macos	`97.36% <79.06%> (-0.04%)`	:arrow_down:
VM-ubuntu	`98.26% <81.39%> (-0.05%)`	:arrow_down:
VM-windows	`96.06% <66.27%> (-0.08%)`	:arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Nov 16 '24 11:11 codecov[bot]

Seems to be a lot slower in the benchmarks. I suspect that's due to our previous refactoring, so in the typical case we're doing a couple of dictionary lookups, which is probably a lot faster than hyperscan. Maybe if there's a hyperscan under each dictionary lookup, that might be an improvement?

Nov 16 '24 13:11 Dreamsorcerer

i.e. Instead of replacing the entire loop, maybe it could replace the await candidate.resolve(request) line in certain circumstances (ideally only when there are regex patterns actually involved).

Nov 16 '24 13:11 Dreamsorcerer

Maybe if there are lot of DynamicResources in a list because they have the same anchor point it would be a lot faster for that case

Nov 16 '24 14:11 bdraco

Maybe if there are lot of DynamicResources in a list because they have the same anchor point it would be a lot faster for that case

Yeah, that's why I mean when I say it might be good if we can use it only when we get something from the dict, and it has (multiple) regexes.

Nov 16 '24 14:11 Dreamsorcerer

I think that regexes are pretty common in the real life, that's why https://github.com/aio-libs/aiohttp/pull/9925 was added.

I agree that for constant path dict lookup could be faster. As we now prefer constant matches over regexes, maybe a {path: resource} dict for plain and prefixed resources + hyperscan for regexes could help.

Let's measure it first.

Nov 16 '24 14:11 asvetlov

I think that regexes are pretty common in the real life, that's why #9925 was added.

Yeah, but also many smaller apps, benchmarks etc. don't use them. I'm hoping with the right preprocessing we can optimise for both use cases.

Nov 16 '24 14:11 Dreamsorcerer

As we now prefer constant matches over regexes

Also, it's not just static paths, but paths that end with regexes will be faster. If the path is /foo/bar/{regex}, then this will likely be 2 dict lookups and a single regex match, so is very likely to still beat hyperscan. Where hyperscan will likely improve performance is URLs like /org/{regex}/this/that/other, in which case this might result in several regex matches with the current code.

Nov 16 '24 14:11 Dreamsorcerer

That's a massive improvement on the gitapi benchmark, so we'll definitely need to figure out some way to integrate this!

Nov 16 '24 15:11 Dreamsorcerer

Just realized Hyperscan likely doesn't work on armv7l so maybe we need benchmarks for with Hyperscan not available as well. Maybe not though since armv7l is now slowly disappearing from the landscape now that the RPi shortage is over

Nov 17 '24 06:11 bdraco

Just realized Hyperscan likely doesn't work on armv7l so maybe we need benchmarks for with Hyperscan not available as well. Maybe not though since armv7l is now slowly disappearing from the landscape now that the RPi shortage is over

Perhaps parametrized benchmarks will help to measure both hyperscan and domestic versions. Also I'm planing to use parametrization for existing url_dispatcher tests (and maybe web_functional ones) for tests coverage and correctness checking.

Nov 17 '24 07:11 asvetlov

From the flame graph on cod speed it looks like test_resolve_gitapi_subapps is producing system routes (maybe 404s)

Nov 17 '24 08:11 bdraco

Good finding, thanks! I'll modify benchmarks to check the resolved route to prevent this situation from becoming possible.

I guess an assertion will not hurt or significantly change timings.

Nov 17 '24 09:11 asvetlov

Just realized Hyperscan likely doesn't work on armv7l so maybe we need benchmarks for with Hyperscan not available as well. Maybe not though since armv7l is now slowly disappearing from the landscape now that the RPi shortage is over

Perhaps parametrized benchmarks will help to measure both hyperscan and domestic versions. Also I'm planing to use parametrization for existing url_dispatcher tests (and maybe web_functional ones) for tests coverage and correctness checking.

Opened https://github.com/darvid/python-hyperscan/issues/170

Feb 06 '25 17:02 bdraco