aiohttp icon indicating copy to clipboard operation
aiohttp copied to clipboard

POC for hyperscan usage in UrlDispatcher

Open asvetlov opened this issue 1 year ago • 17 comments

https://pypi.org/project/hyperscan/ could speed up url dispatcher if available.

Fallback to the current implementation is supported if hyperscan compilation fails or not available.

asvetlov avatar Nov 15 '24 12:11 asvetlov

CodSpeed Performance Report

Merging #9907 will degrade performances by 26.95%

Comparing hyperscan (a87dc32) with master (e79b2d5)

Summary

⚡ 3 improvements
❌ 3 regressions
✅ 37 untouched benchmarks

:warning: Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
test_resolve_dynamic_resource_url_with_many_dynamic_routes[pyloop] 3.7 ms 5.1 ms -26.95%
test_resolve_dynamic_resource_url_with_many_dynamic_routes_with_common_prefix[pyloop] 260.6 ms 4.9 ms ×53
test_resolve_dynamic_resource_url_with_many_static_routes[pyloop] 3.7 ms 5 ms -25.34%
test_resolve_gitapi[pyloop] 316 ms 5.8 ms ×55
test_resolve_gitapi_subapps[pyloop] 321 ms 7.9 ms ×41
test_resolve_static_root_route[pyloop] 1.1 ms 1.2 ms -10.84%

codspeed-hq[bot] avatar Nov 15 '24 12:11 codspeed-hq[bot]

Managed to finish the benchmarks before boarding the flight. They should run if you update with master.

Didn't have enough time to dig through downstream or write the tests yet though

bdraco avatar Nov 15 '24 17:11 bdraco

Thank you!

asvetlov avatar Nov 15 '24 17:11 asvetlov

Codecov Report

Attention: Patch coverage is 82.55814% with 15 lines in your changes missing coverage. Please review.

Project coverage is 98.69%. Comparing base (e79b2d5) to head (a87dc32).

:white_check_mark: All tests successful. No failed tests found.

Files with missing lines Patch % Lines
aiohttp/web_urldispatcher.py 82.55% 8 Missing and 7 partials :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9907      +/-   ##
==========================================
- Coverage   98.73%   98.69%   -0.05%     
==========================================
  Files         121      121              
  Lines       36747    36825      +78     
  Branches     4387     4404      +17     
==========================================
+ Hits        36281    36343      +62     
- Misses        314      323       +9     
- Partials      152      159       +7     
Flag Coverage Δ
CI-GHA 98.57% <81.39%> (-0.05%) :arrow_down:
OS-Linux 98.26% <81.39%> (-0.05%) :arrow_down:
OS-Windows 96.06% <66.27%> (-0.08%) :arrow_down:
OS-macOS 97.36% <79.06%> (-0.04%) :arrow_down:
Py-3.10.11 97.22% <81.39%> (-0.03%) :arrow_down:
Py-3.10.15 97.75% <79.06%> (-0.10%) :arrow_down:
Py-3.11.10 97.78% <79.06%> (-0.06%) :arrow_down:
Py-3.11.9 97.26% <81.39%> (-0.05%) :arrow_down:
Py-3.12.7 98.32% <81.39%> (-0.05%) :arrow_down:
Py-3.13.0 98.31% <81.39%> (-0.06%) :arrow_down:
Py-3.9.13 97.13% <81.17%> (-0.05%) :arrow_down:
Py-3.9.20 97.67% <78.82%> (-0.05%) :arrow_down:
Py-pypy7.3.16 97.25% <65.88%> (-0.08%) :arrow_down:
VM-macos 97.36% <79.06%> (-0.04%) :arrow_down:
VM-ubuntu 98.26% <81.39%> (-0.05%) :arrow_down:
VM-windows 96.06% <66.27%> (-0.08%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Nov 16 '24 11:11 codecov[bot]

Seems to be a lot slower in the benchmarks. I suspect that's due to our previous refactoring, so in the typical case we're doing a couple of dictionary lookups, which is probably a lot faster than hyperscan. Maybe if there's a hyperscan under each dictionary lookup, that might be an improvement?

Dreamsorcerer avatar Nov 16 '24 13:11 Dreamsorcerer

i.e. Instead of replacing the entire loop, maybe it could replace the await candidate.resolve(request) line in certain circumstances (ideally only when there are regex patterns actually involved).

Dreamsorcerer avatar Nov 16 '24 13:11 Dreamsorcerer

Maybe if there are lot of DynamicResources in a list because they have the same anchor point it would be a lot faster for that case

bdraco avatar Nov 16 '24 14:11 bdraco

Maybe if there are lot of DynamicResources in a list because they have the same anchor point it would be a lot faster for that case

Yeah, that's why I mean when I say it might be good if we can use it only when we get something from the dict, and it has (multiple) regexes.

Dreamsorcerer avatar Nov 16 '24 14:11 Dreamsorcerer

I think that regexes are pretty common in the real life, that's why https://github.com/aio-libs/aiohttp/pull/9925 was added.

I agree that for constant path dict lookup could be faster. As we now prefer constant matches over regexes, maybe a {path: resource} dict for plain and prefixed resources + hyperscan for regexes could help.

Let's measure it first.

asvetlov avatar Nov 16 '24 14:11 asvetlov

I think that regexes are pretty common in the real life, that's why #9925 was added.

Yeah, but also many smaller apps, benchmarks etc. don't use them. I'm hoping with the right preprocessing we can optimise for both use cases.

Dreamsorcerer avatar Nov 16 '24 14:11 Dreamsorcerer

As we now prefer constant matches over regexes

Also, it's not just static paths, but paths that end with regexes will be faster. If the path is /foo/bar/{regex}, then this will likely be 2 dict lookups and a single regex match, so is very likely to still beat hyperscan. Where hyperscan will likely improve performance is URLs like /org/{regex}/this/that/other, in which case this might result in several regex matches with the current code.

Dreamsorcerer avatar Nov 16 '24 14:11 Dreamsorcerer

That's a massive improvement on the gitapi benchmark, so we'll definitely need to figure out some way to integrate this!

Dreamsorcerer avatar Nov 16 '24 15:11 Dreamsorcerer

Just realized Hyperscan likely doesn't work on armv7l so maybe we need benchmarks for with Hyperscan not available as well. Maybe not though since armv7l is now slowly disappearing from the landscape now that the RPi shortage is over

bdraco avatar Nov 17 '24 06:11 bdraco

Just realized Hyperscan likely doesn't work on armv7l so maybe we need benchmarks for with Hyperscan not available as well. Maybe not though since armv7l is now slowly disappearing from the landscape now that the RPi shortage is over

Perhaps parametrized benchmarks will help to measure both hyperscan and domestic versions. Also I'm planing to use parametrization for existing url_dispatcher tests (and maybe web_functional ones) for tests coverage and correctness checking.

asvetlov avatar Nov 17 '24 07:11 asvetlov

From the flame graph on cod speed it looks like test_resolve_gitapi_subapps is producing system routes (maybe 404s)

bdraco avatar Nov 17 '24 08:11 bdraco

Good finding, thanks! I'll modify benchmarks to check the resolved route to prevent this situation from becoming possible.

I guess an assertion will not hurt or significantly change timings.

asvetlov avatar Nov 17 '24 09:11 asvetlov

Just realized Hyperscan likely doesn't work on armv7l so maybe we need benchmarks for with Hyperscan not available as well. Maybe not though since armv7l is now slowly disappearing from the landscape now that the RPi shortage is over

Perhaps parametrized benchmarks will help to measure both hyperscan and domestic versions. Also I'm planing to use parametrization for existing url_dispatcher tests (and maybe web_functional ones) for tests coverage and correctness checking.

Opened https://github.com/darvid/python-hyperscan/issues/170

bdraco avatar Feb 06 '25 17:02 bdraco