sve2 HISTSEG support for fast ondemand parsing on ARM
This PR uses the SVE2 HISTSEG instruction to improve performance of ondemand parsing in ARM cpus
https://developer.arm.com/documentation/ddi0602/latest/SVE-Instructions/HISTSEG--Count-matching-elements-in-vector-segments-
Creating the bitmasks for each structural character is very expensive in ARM due to the lack of pmovmskb similar instructions. Histseg gives us the number of characters of interest in a single instruction with low latency and high throughput in neoverse v2 cpus.
This allows us to skip mask creation of characters that are not present and even to implement some fast path codes when we have only quotes for example. It is worth noting that x86 doesn't have an equivalent instruction and the avx2 VP2INTERSECT instruction doesn't support int8 datatypes.
Performance:
Master branch
twitter/SonicOnDemand_Normal 22625 ns 22624 ns 30718 bytes_per_second=10.8832Gi/s Normal
citm_catalog/SonicOnDemand_Fronter 12861 ns 12862 ns 54399 bytes_per_second=125.067Gi/s Fronter
twitter/SonicOnDemand_NotFound 22423 ns 22422 ns 31349 bytes_per_second=10.9814Gi/s NotFound
This PR
twitter/SonicOnDemand_Normal 6464 ns 6464 ns 108097 bytes_per_second=38.0904Gi/s Normal
citm_catalog/SonicOnDemand_Fronter 6978 ns 6978 ns 100373 bytes_per_second=230.538Gi/s Fronter
twitter/SonicOnDemand_NotFound 6325 ns 6325 ns 110703 bytes_per_second=38.9312Gi/s NotFound
This is faster than even the most recent x86 cpus which shows result around 24Gi/s for twitter.
Thanks to @supermartian !
This PR is contributed by nvidia.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 73.40%. Comparing base (
4250a05) to head (6603f46). Report is 12 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #108 +/- ##
==========================================
- Coverage 74.79% 73.40% -1.40%
==========================================
Files 21 22 +1
Lines 2436 2741 +305
Branches 667 749 +82
==========================================
+ Hits 1822 2012 +190
- Misses 297 430 +133
+ Partials 317 299 -18
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
#110 The CI bugs have been fixed,please rebase master
@xiegx94 Rebased! sorry for the long wait :(
@xiegx94 Thanks for the comments! I think I've addressed them :)
Hi @xiegx94 :) any chance you can look at it soon? Thanks!
@xiegx94 @liuq19 any chance of getting this merged? Thanks!