sonic-cpp icon indicating copy to clipboard operation
sonic-cpp copied to clipboard

sve2 HISTSEG support for fast ondemand parsing on ARM

Open emcastillo opened this issue 8 months ago • 5 comments

This PR uses the SVE2 HISTSEG instruction to improve performance of ondemand parsing in ARM cpus

https://developer.arm.com/documentation/ddi0602/latest/SVE-Instructions/HISTSEG--Count-matching-elements-in-vector-segments-

Creating the bitmasks for each structural character is very expensive in ARM due to the lack of pmovmskb similar instructions. Histseg gives us the number of characters of interest in a single instruction with low latency and high throughput in neoverse v2 cpus.

This allows us to skip mask creation of characters that are not present and even to implement some fast path codes when we have only quotes for example. It is worth noting that x86 doesn't have an equivalent instruction and the avx2 VP2INTERSECT instruction doesn't support int8 datatypes.

Performance:

Master branch

twitter/SonicOnDemand_Normal            22625 ns        22624 ns        30718 bytes_per_second=10.8832Gi/s Normal
citm_catalog/SonicOnDemand_Fronter      12861 ns        12862 ns        54399 bytes_per_second=125.067Gi/s Fronter
twitter/SonicOnDemand_NotFound          22423 ns        22422 ns        31349 bytes_per_second=10.9814Gi/s NotFound

This PR

twitter/SonicOnDemand_Normal             6464 ns         6464 ns       108097 bytes_per_second=38.0904Gi/s Normal
citm_catalog/SonicOnDemand_Fronter       6978 ns         6978 ns       100373 bytes_per_second=230.538Gi/s Fronter
twitter/SonicOnDemand_NotFound           6325 ns         6325 ns       110703 bytes_per_second=38.9312Gi/s NotFound

This is faster than even the most recent x86 cpus which shows result around 24Gi/s for twitter.

Thanks to @supermartian !

This PR is contributed by nvidia.

emcastillo avatar May 11 '25 05:05 emcastillo

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 73.40%. Comparing base (4250a05) to head (6603f46). Report is 12 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #108      +/-   ##
==========================================
- Coverage   74.79%   73.40%   -1.40%     
==========================================
  Files          21       22       +1     
  Lines        2436     2741     +305     
  Branches      667      749      +82     
==========================================
+ Hits         1822     2012     +190     
- Misses        297      430     +133     
+ Partials      317      299      -18     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov-commenter avatar May 12 '25 06:05 codecov-commenter

#110 The CI bugs have been fixed,please rebase master

xiegx94 avatar Jun 11 '25 04:06 xiegx94

@xiegx94 Rebased! sorry for the long wait :(

emcastillo avatar Jul 30 '25 15:07 emcastillo

@xiegx94 Thanks for the comments! I think I've addressed them :)

emcastillo avatar Aug 17 '25 13:08 emcastillo

Hi @xiegx94 :) any chance you can look at it soon? Thanks!

emcastillo avatar Sep 26 '25 05:09 emcastillo

@xiegx94 @liuq19 any chance of getting this merged? Thanks!

emcastillo avatar Nov 28 '25 04:11 emcastillo