jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8317721: RISC-V: Implement CRC32 intrinsic

Open ArsenyBochkarev opened this issue 1 year ago • 22 comments

Hi everyone! Please review this port of AArch64 _updateBytesCRC32, _updateByteBufferCRC32 and _updateCRC32 intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version.

Correctness checks

Tier 1/2 tests are ok.

Performance results on T-Head board

Results for enabled intrinsic:

Used test is test/micro/org/openjdk/bench/java/util/TestCRC32.java

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 64 thrpt 24 3730.929 37.773 ops/ms
CRC32.TestCRC32.testCRC32Update 128 thrpt 24 2126.673 2.032 ops/ms
CRC32.TestCRC32.testCRC32Update 256 thrpt 24 1134.330 6.714 ops/ms
CRC32.TestCRC32.testCRC32Update 512 thrpt 24 584.017 2.267 ops/ms
CRC32.TestCRC32.testCRC32Update 2048 thrpt 24 151.173 0.346 ops/ms
CRC32.TestCRC32.testCRC32Update 16384 thrpt 24 19.113 0.008 ops/ms
CRC32.TestCRC32.testCRC32Update 65536 thrpt 24 4.647 0.022 ops/ms

Results for disabled intrinsic:

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 64 thrpt 15 798.365 35.486 ops/ms
CRC32.TestCRC32.testCRC32Update 128 thrpt 15 677.756 46.619 ops/ms
CRC32.TestCRC32.testCRC32Update 256 thrpt 15 552.781 27.143 ops/ms
CRC32.TestCRC32.testCRC32Update 512 thrpt 15 429.304 12.518 ops/ms
CRC32.TestCRC32.testCRC32Update 2048 thrpt 15 166.738 0.935 ops/ms
CRC32.TestCRC32.testCRC32Update 16384 thrpt 15 25.060 0.034 ops/ms
CRC32.TestCRC32.testCRC32Update 65536 thrpt 15 6.196 0.030 ops/ms

Progress

  • [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8317721: RISC-V: Implement CRC32 intrinsic (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17046/head:pull/17046
$ git checkout pull/17046

Update a local copy of the PR:
$ git checkout pull/17046
$ git pull https://git.openjdk.org/jdk.git pull/17046/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 17046

View PR using the GUI difftool:
$ git pr show -t 17046

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17046.diff

Webrev

Link to Webrev Comment

ArsenyBochkarev avatar Dec 11 '23 01:12 ArsenyBochkarev

/cc hotspot-compiler

ArsenyBochkarev avatar Dec 11 '23 01:12 ArsenyBochkarev

:wave: Welcome back ArsenyBochkarev! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Dec 11 '23 02:12 bridgekeeper[bot]

@ArsenyBochkarev The hotspot-compiler label was successfully added.

openjdk[bot] avatar Dec 11 '23 02:12 openjdk[bot]

Performance comparison for disabling/enabling Zba on StarFive VisionFive 2 board:

-XX:-UseZba:

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 64 thrpt 12 3563.320 3.326 ops/ms
CRC32.TestCRC32.testCRC32Update 128 thrpt 12 1928.837 2.234 ops/ms
CRC32.TestCRC32.testCRC32Update 256 thrpt 12 1005.273 1.953 ops/ms
CRC32.TestCRC32.testCRC32Update 512 thrpt 12 512.550 1.718 ops/ms
CRC32.TestCRC32.testCRC32Update 2048 thrpt 12 130.396 0.341 ops/ms
CRC32.TestCRC32.testCRC32Update 16384 thrpt 12 16.319 0.073 ops/ms
CRC32.TestCRC32.testCRC32Update 65536 thrpt 12 3.913 0.011 ops/ms

-XX:+UseZba:

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 64 thrpt 12 4206.654 0.547 ops/ms
CRC32.TestCRC32.testCRC32Update 128 thrpt 12 2308.843 3.565 ops/ms
CRC32.TestCRC32.testCRC32Update 256 thrpt 12 1214.727 0.305 ops/ms
CRC32.TestCRC32.testCRC32Update 512 thrpt 12 623.173 0.651 ops/ms
CRC32.TestCRC32.testCRC32Update 2048 thrpt 12 158.965 0.376 ops/ms
CRC32.TestCRC32.testCRC32Update 16384 thrpt 12 19.934 0.055 ops/ms
CRC32.TestCRC32.testCRC32Update 65536 thrpt 12 4.730 0.007 ops/ms

-XX:-UseCRC32Intrinsics:

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 64 thrpt 12 1390.530 42.217 ops/ms
CRC32.TestCRC32.testCRC32Update 128 thrpt 12 1109.742 24.201 ops/ms
CRC32.TestCRC32.testCRC32Update 256 thrpt 12 805.345 12.155 ops/ms
CRC32.TestCRC32.testCRC32Update 512 thrpt 12 520.965 5.651 ops/ms
CRC32.TestCRC32.testCRC32Update 2048 thrpt 12 169.591 0.747 ops/ms
CRC32.TestCRC32.testCRC32Update 16384 thrpt 12 22.624 0.139 ops/ms
CRC32.TestCRC32.testCRC32Update 65536 thrpt 12 5.430 0.016 ops/ms

ArsenyBochkarev avatar Dec 11 '23 15:12 ArsenyBochkarev

Thanks! (not a review, just an ack)

There are two other version we probably need also, using carry-less-multiplication. There is a scalar clmul in Zbc and there is a vclmul in vector.

robehn avatar Dec 12 '23 09:12 robehn

Performance measurements on the same benchmark, on T-Head board. I used the -XX:TieredStopAtLevel=1 flag:

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 64 thrpt 12 3617.860 17.463 ops/ms
CRC32.TestCRC32.testCRC32Update 128 thrpt 12 2253.626 4.739 ops/ms
CRC32.TestCRC32.testCRC32Update 256 thrpt 12 1242.516 83.245 ops/ms
CRC32.TestCRC32.testCRC32Update 512 thrpt 12 687.339 1.712 ops/ms
CRC32.TestCRC32.testCRC32Update 2048 thrpt 12 183.016 0.258 ops/ms
CRC32.TestCRC32.testCRC32Update 16384 thrpt 12 23.368 0.133 ops/ms
CRC32.TestCRC32.testCRC32Update 65536 thrpt 12 5.640 0.023 ops/ms

Current results with no such flag:

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 64 thrpt 12 4270.963 47.669 ops/ms
CRC32.TestCRC32.testCRC32Update 128 thrpt 12 2448.523 72.738 ops/ms
CRC32.TestCRC32.testCRC32Update 256 thrpt 12 1360.692 30.246 ops/ms
CRC32.TestCRC32.testCRC32Update 512 thrpt 12 709.729 7.413 ops/ms
CRC32.TestCRC32.testCRC32Update 2048 thrpt 12 183.903 1.051 ops/ms
CRC32.TestCRC32.testCRC32Update 16384 thrpt 12 23.320 0.038 ops/ms
CRC32.TestCRC32.testCRC32Update 65536 thrpt 12 5.651 0.017 ops/ms

ArsenyBochkarev avatar Dec 21 '23 22:12 ArsenyBochkarev

Performance comparison for disabling/enabling Zba on StarFive VisionFive 2 board:

-XX:-UseZba:

Benchmark	(count)	Mode	Cnt	Score	Error	Units

CRC32.TestCRC32.testCRC32Update	512	thrpt	12	512.550	1.718	ops/ms
CRC32.TestCRC32.testCRC32Update	2048	thrpt	12	130.396	0.341	ops/ms
CRC32.TestCRC32.testCRC32Update	16384	thrpt	12	16.319	0.073	ops/ms
CRC32.TestCRC32.testCRC32Update	65536	thrpt	12	3.913	0.011	ops/ms

-XX:+UseZba:

Benchmark	(count)	Mode	Cnt	Score	Error	Units

CRC32.TestCRC32.testCRC32Update	512	thrpt	12	623.173	0.651	ops/ms
CRC32.TestCRC32.testCRC32Update	2048	thrpt	12	158.965	0.376	ops/ms
CRC32.TestCRC32.testCRC32Update	16384	thrpt	12	19.934	0.055	ops/ms
CRC32.TestCRC32.testCRC32Update	65536	thrpt	12	4.730	0.007	ops/ms

-XX:-UseCRC32Intrinsics:

Benchmark	(count)	Mode	Cnt	Score	Error	Units

CRC32.TestCRC32.testCRC32Update	512	thrpt	12	520.965	5.651	ops/ms
CRC32.TestCRC32.testCRC32Update	2048	thrpt	12	169.591	0.747	ops/ms
CRC32.TestCRC32.testCRC32Update	16384	thrpt	12	22.624	0.139	ops/ms
CRC32.TestCRC32.testCRC32Update	65536	thrpt	12	5.430	0.016	ops/ms

Seems there is regression when count >= 512, especially when count >= 2048. And I suppose that big message is common case for CRC32 usage?

Hamlin-Li avatar Dec 22 '23 10:12 Hamlin-Li

Performance comparison for disabling/enabling Zba on StarFive VisionFive 2 board:

-XX:-UseZba:

Benchmark	(count)	Mode	Cnt	Score	Error	Units

CRC32.TestCRC32.testCRC32Update	512	thrpt	12	512.550	1.718	ops/ms
CRC32.TestCRC32.testCRC32Update	2048	thrpt	12	130.396	0.341	ops/ms
CRC32.TestCRC32.testCRC32Update	16384	thrpt	12	16.319	0.073	ops/ms
CRC32.TestCRC32.testCRC32Update	65536	thrpt	12	3.913	0.011	ops/ms

-XX:+UseZba:

Benchmark	(count)	Mode	Cnt	Score	Error	Units

CRC32.TestCRC32.testCRC32Update	512	thrpt	12	623.173	0.651	ops/ms
CRC32.TestCRC32.testCRC32Update	2048	thrpt	12	158.965	0.376	ops/ms
CRC32.TestCRC32.testCRC32Update	16384	thrpt	12	19.934	0.055	ops/ms
CRC32.TestCRC32.testCRC32Update	65536	thrpt	12	4.730	0.007	ops/ms

-XX:-UseCRC32Intrinsics:

Benchmark	(count)	Mode	Cnt	Score	Error	Units

CRC32.TestCRC32.testCRC32Update	512	thrpt	12	520.965	5.651	ops/ms
CRC32.TestCRC32.testCRC32Update	2048	thrpt	12	169.591	0.747	ops/ms
CRC32.TestCRC32.testCRC32Update	16384	thrpt	12	22.624	0.139	ops/ms
CRC32.TestCRC32.testCRC32Update	65536	thrpt	12	5.430	0.016	ops/ms

Seems there is regression when count >= 512, especially when count >= 2048. And I suppose that big message is common case for CRC32 usage?

Hmm, I don't know about common CRC32 count parameter sizes, maybe others know? :sweat: Or maybe anyone knows if there are other ways to optimize such plain version of intrinsic more?

ArsenyBochkarev avatar Dec 22 '23 15:12 ArsenyBochkarev

Hello again everyone! I was able to optimize regressions for most cases on big amount of data by partially unrolling the big loop and disposing from loop counter (previously in len register). Results for -XX:+UseZba of StarFive VisionFive2 board:

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 64 thrpt 12 4215.728 3.972 ops/ms
CRC32.TestCRC32.testCRC32Update 128 thrpt 12 2607.882 1.627 ops/ms
CRC32.TestCRC32.testCRC32Update 256 thrpt 12 1364.899 8.857 ops/ms
CRC32.TestCRC32.testCRC32Update 512 thrpt 12 704.316 3.222 ops/ms
CRC32.TestCRC32.testCRC32Update 2048 thrpt 12 180.738 0.474 ops/ms
CRC32.TestCRC32.testCRC32Update 16384 thrpt 12 22.722 0.059 ops/ms
CRC32.TestCRC32.testCRC32Update 65536 thrpt 12 5.327 0.019 ops/ms

while the results for -XX:-UseCRC32Intrinsics are here

ArsenyBochkarev avatar Jan 09 '24 16:01 ArsenyBochkarev

Same the performance trend is that: the larger the data size, the closer the performance gap. when size is 65536, there seems a little perf regression. So I wonder how it will behave when the size is bigger than 65536, and whether we need to consider the size bigger than 65536 depends on what's the expected regular data size of java CRC32, are the larger data size (equal or larger than 65536) common cases?

Hamlin-Li avatar Jan 10 '24 10:01 Hamlin-Li

@ArsenyBochkarev This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Feb 07 '24 11:02 bridgekeeper[bot]

@ArsenyBochkarev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8317721: RISC-V: Implement CRC32 intrinsic

Reviewed-by: vkempik, rehn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1388 new commits pushed to the master branch:

  • 3ca2bcd402042791d7460dd79ee16a3f88436b3e: 8335060: ClassCastException after JDK-8294960
  • 747e1e47f576b0ca3ac97d1deea87418e67ff2d1: 8334295: CTW: update modules
  • 0a6ffa57954ddf4f92205205a5a1bada813d127a: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container
  • 71e3798bf67cddef37a8b4e377c4bf21dbd01567: 8335308: compiler/uncommontrap/DeoptReallocFailure.java times out with SerialGC on Windows
  • c7e9ebb4cfff56b7a977eb2942f563f96b3336bd: 8331732: [PPC64] Unify and optimize code which converts != 0 to 1
  • 53242cdf9ef17c502ebd541e84370e7c158639c1: 8335283: Build failure due to 'no_sanitize' attribute directive ignored
  • d9bcf061450ebfb7fe02b5a50c855db1d9178e5d: 8335217: Fix memory ordering in ClassLoaderData::ChunkedHandleList
  • bb18498d71dddf49db9bdfac886aed9ae123651d: 8335349: jcmd VM.classloaders "fold" option should be optional
  • 8350b1daedae8ef5785a7165e664b1d3149b18b7: 8335294: Fix simple -Wzero-as-null-pointer-constant warnings in gc code
  • 5d866bf17d96bd0f0e4545d7eee5912eda2e3a94: 8335252: Reduce size of j.u.Formatter.Conversion#isValid
  • ... and 1378 more: https://git.openjdk.org/jdk/compare/fb390d202c8bbbbb87ba48fd01387feb35a1b768...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@feilongjiang, @Hamlin-Li, @RealFYang, @VladimirKempik, @robehn) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

openjdk[bot] avatar Mar 13 '24 20:03 openjdk[bot]

@ArsenyBochkarev this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8317721
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

openjdk[bot] avatar Mar 18 '24 10:03 openjdk[bot]

I managed to get some additional acceleration for cases with Zba enabled. Updated data for StarFive VisionFive2:

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 64 thrpt 12 4231.837 12.249 ops/ms
CRC32.TestCRC32.testCRC32Update 128 thrpt 12 2678.843 1.631 ops/ms
CRC32.TestCRC32.testCRC32Update 256 thrpt 12 1405.024 6.509 ops/ms
CRC32.TestCRC32.testCRC32Update 512 thrpt 12 727.608 1.393 ops/ms
CRC32.TestCRC32.testCRC32Update 2048 thrpt 12 186.552 0.389 ops/ms
CRC32.TestCRC32.testCRC32Update 16384 thrpt 12 23.423 0.087 ops/ms
CRC32.TestCRC32.testCRC32Update 65536 thrpt 12 5.493 0.015 ops/ms

Results for disabled intrinsic are here

performance numbers on unmatched

Current data for HiFive Unmatched (no Zba here!):

Enabled intrinsic:

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 64 thrpt 12 3180.082 ± 63.442 ops/ms
CRC32.TestCRC32.testCRC32Update 128 thrpt 12 1936.728 ± 17.332 ops/ms
CRC32.TestCRC32.testCRC32Update 256 thrpt 12 1019.500 ± 5.038 ops/ms
CRC32.TestCRC32.testCRC32Update 512 thrpt 12 527.775 ± 2.059 ops/ms
CRC32.TestCRC32.testCRC32Update 2048 thrpt 12 135.190 ± 0.279 ops/ms
CRC32.TestCRC32.testCRC32Update 16384 thrpt 12 16.996 ± 0.066 ops/ms
CRC32.TestCRC32.testCRC32Update 65536 thrpt 12 3.877 ± 0.011 ops/ms

Disabled intrinsic:

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 64 thrpt 12 992.300 ± 17.666 ops/ms
CRC32.TestCRC32.testCRC32Update 128 thrpt 12 818.234 ± 9.767 ops/ms
CRC32.TestCRC32.testCRC32Update 256 thrpt 12 605.509 ± 14.685 ops/ms
CRC32.TestCRC32.testCRC32Update 512 thrpt 12 402.414 ± 4.331 ops/ms
CRC32.TestCRC32.testCRC32Update 2048 thrpt 12 134.390 ± 1.399 ops/ms
CRC32.TestCRC32.testCRC32Update 16384 thrpt 12 18.619 ± 0.104 ops/ms
CRC32.TestCRC32.testCRC32Update 65536 thrpt 12 4.229 ± 0.020 ops/ms

ArsenyBochkarev avatar Mar 19 '24 11:03 ArsenyBochkarev

Perhaps we should do this intrinsic Zba-exclusive?

ArsenyBochkarev avatar Mar 19 '24 14:03 ArsenyBochkarev

More accel for -XX:+UseZba case (on StarFive VisionFive2):

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 64 thrpt 12 4415.416 2.822 ops/ms
CRC32.TestCRC32.testCRC32Update 128 thrpt 12 2756.321 0.769 ops/ms
CRC32.TestCRC32.testCRC32Update 256 thrpt 12 1450.461 4.115 ops/ms
CRC32.TestCRC32.testCRC32Update 512 thrpt 12 750.851 0.496 ops/ms
CRC32.TestCRC32.testCRC32Update 2048 thrpt 12 192.352 0.599 ops/ms
CRC32.TestCRC32.testCRC32Update 16384 thrpt 12 24.209 0.044 ops/ms
CRC32.TestCRC32.testCRC32Update 65536 thrpt 12 5.670 0.015 ops/ms

ArsenyBochkarev avatar Mar 31 '24 15:03 ArsenyBochkarev

The results looks good now on VF2. Considering Zba is mandatory for RVA22U64 this can be accepted now. Can some reviewer take a look at this again please ?

VladimirKempik avatar Apr 01 '24 10:04 VladimirKempik

Thanks for updating and continuous refinement.

image

Seems the performance gain (last column in the picture) introduced by intrinsic is getting less and less when the data size increasing. So IMHO, when data size is big enough, it brings performance regression rather than performance gain.

Hamlin-Li avatar Apr 02 '24 16:04 Hamlin-Li

@Hamlin-Li I modified test/micro/org/openjdk/bench/java/util/TestCRC32.java a bit to see if the regression happens on increased data, and run it on VisionFive2 with Zba enabed:

Enabled intrinsic

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 131072 thrpt 40 2.841 0.001 ops/ms
CRC32.TestCRC32.testCRC32Update 262144 thrpt 40 1.420 0.001 ops/ms
CRC32.TestCRC32.testCRC32Update 524288 thrpt 40 0.709 0.001 ops/ms
CRC32.TestCRC32.testCRC32Update 2097152 thrpt 40 0.176 0.001 ops/ms

Disabled intrinsic

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 131072 thrpt 40 2.729 0.003 ops/ms
CRC32.TestCRC32.testCRC32Update 262144 thrpt 40 1.367 0.001 ops/ms
CRC32.TestCRC32.testCRC32Update 524288 thrpt 40 0.684 0.001 ops/ms
CRC32.TestCRC32.testCRC32Update 2097152 thrpt 40 0.170 0.001 ops/ms
(count) enabled/disabled
131072 1,041040674
262144 1,038771031
524288 1.036549708
2097152 1,035294118

So since there are no regressions compared to C2-generated code with -XX:+UseZba, how about making the intrinsic Zba-exclusive?

ArsenyBochkarev avatar Apr 02 '24 18:04 ArsenyBochkarev

Thanks for updating. Seems fine, but I'm not sure. Maybe see how others think about it.

Just FYI, as the trend of performance gain in this implementation is less and less as the data size grow larger, so I wonder if the CRC algorithm used in this implementation is optimal enough. Seems there're other more advanced algorithms which are supposed to bring more optimistic performance gains, and some of these algorithms are already implemented on other platforms in jdk.

Hamlin-Li avatar Apr 03 '24 11:04 Hamlin-Li

@ArsenyBochkarev This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar May 17 '24 22:05 bridgekeeper[bot]

Hello again everyone! I made two changes to this PR:

  1. Now the intrinsic is Zba-exclusive.
  2. I used the slli + srliw combination instead of srli + andi and rescheduled them a bit, making use of t6 register.

Current results on StarFive VisionFive2 (with Zba) are:

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 64 thrpt 12 4644.129 9.566 ops/ms
CRC32.TestCRC32.testCRC32Update 128 thrpt 12 2911.927 12.866 ops/ms
CRC32.TestCRC32.testCRC32Update 256 thrpt 12 1538.630 5.463 ops/ms
CRC32.TestCRC32.testCRC32Update 512 thrpt 12 799.100 3.216 ops/ms
CRC32.TestCRC32.testCRC32Update 2048 thrpt 12 205.947 0.236 ops/ms
CRC32.TestCRC32.testCRC32Update 16384 thrpt 12 25.880 0.069 ops/ms
CRC32.TestCRC32.testCRC32Update 65536 thrpt 12 6.022 0.020 ops/ms

Results for disabled intrinsic on VisionFive2 (taken from here)

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 64 thrpt 12 1390.530 42.217 ops/ms
CRC32.TestCRC32.testCRC32Update 128 thrpt 12 1109.742 24.201 ops/ms
CRC32.TestCRC32.testCRC32Update 256 thrpt 12 805.345 12.155 ops/ms
CRC32.TestCRC32.testCRC32Update 512 thrpt 12 520.965 5.651 ops/ms
CRC32.TestCRC32.testCRC32Update 2048 thrpt 12 169.591 0.747 ops/ms
CRC32.TestCRC32.testCRC32Update 16384 thrpt 12 22.624 0.139 ops/ms
CRC32.TestCRC32.testCRC32Update 65536 thrpt 12 5.430 0.016 ops/ms

ArsenyBochkarev avatar May 22 '24 14:05 ArsenyBochkarev

@ArsenyBochkarev This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Jun 19 '24 18:06 bridgekeeper[bot]

Hi all! Can anyone take another look, please?

Current numbers on VisionFive2: with intrinsic enabled:

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 64 thrpt 12 4644.129 9.566 ops/ms
CRC32.TestCRC32.testCRC32Update 128 thrpt 12 2911.927 12.866 ops/ms
CRC32.TestCRC32.testCRC32Update 256 thrpt 12 1538.630 5.463 ops/ms
CRC32.TestCRC32.testCRC32Update 512 thrpt 12 799.100 3.216 ops/ms
CRC32.TestCRC32.testCRC32Update 2048 thrpt 12 205.947 0.236 ops/ms
CRC32.TestCRC32.testCRC32Update 16384 thrpt 12 25.880 0.069 ops/ms
CRC32.TestCRC32.testCRC32Update 65536 thrpt 12 6.022 0.020 ops/ms
CRC32.TestCRC32.testCRC32Update 131072 thrpt 40 3.030 0.001 ops/ms
CRC32.TestCRC32.testCRC32Update 262144 thrpt 40 1.514 0.001 ops/ms
CRC32.TestCRC32.testCRC32Update 524288 thrpt 40 0.757 0.001 ops/ms
CRC32.TestCRC32.testCRC32Update 2097152 thrpt 40 0.188 0.001 ops/ms

and without intrinsic enabled:

Benchmark (count) Mode Cnt Score Error Units
CRC32.TestCRC32.testCRC32Update 64 thrpt 12 1390.530 42.217 ops/ms
CRC32.TestCRC32.testCRC32Update 128 thrpt 12 1109.742 24.201 ops/ms
CRC32.TestCRC32.testCRC32Update 256 thrpt 12 805.345 12.155 ops/ms
CRC32.TestCRC32.testCRC32Update 512 thrpt 12 520.965 5.651 ops/ms
CRC32.TestCRC32.testCRC32Update 2048 thrpt 12 169.591 0.747 ops/ms
CRC32.TestCRC32.testCRC32Update 16384 thrpt 12 22.624 0.139 ops/ms
CRC32.TestCRC32.testCRC32Update 65536 thrpt 12 5.430 0.016 ops/ms
CRC32.TestCRC32.testCRC32Update 131072 thrpt 40 2.729 0.003 ops/ms
CRC32.TestCRC32.testCRC32Update 262144 thrpt 40 1.367 0.001 ops/ms
CRC32.TestCRC32.testCRC32Update 524288 thrpt 40 0.684 0.001 ops/ms
CRC32.TestCRC32.testCRC32Update 2097152 thrpt 40 0.170 0.001 ops/ms

ArsenyBochkarev avatar Jun 29 '24 12:06 ArsenyBochkarev

Thanks everyone for all the comments! Sanity checks for ./test/hotspot/jtreg/compiler/codegen/CRCTest.java and test/hotspot/jtreg/compiler/intrinsics/zip/TestCRC32.java are ok.

ArsenyBochkarev avatar Jul 01 '24 12:07 ArsenyBochkarev

/integrate

ArsenyBochkarev avatar Jul 01 '24 12:07 ArsenyBochkarev

@ArsenyBochkarev Your change (at version 204e6caa409edfe581d8f9b9018dd41e87c19346) is now ready to be sponsored by a Committer.

openjdk[bot] avatar Jul 01 '24 12:07 openjdk[bot]

/sponsor

luhenry avatar Jul 01 '24 12:07 luhenry

Going to push as commit 2f4f6cc34c10c5519c74abbce8d1715013b50d5d. Since your change was applied there have been 1388 commits pushed to the master branch:

  • 3ca2bcd402042791d7460dd79ee16a3f88436b3e: 8335060: ClassCastException after JDK-8294960
  • 747e1e47f576b0ca3ac97d1deea87418e67ff2d1: 8334295: CTW: update modules
  • 0a6ffa57954ddf4f92205205a5a1bada813d127a: 8261242: [Linux] OSContainer::is_containerized() returns true when run outside a container
  • 71e3798bf67cddef37a8b4e377c4bf21dbd01567: 8335308: compiler/uncommontrap/DeoptReallocFailure.java times out with SerialGC on Windows
  • c7e9ebb4cfff56b7a977eb2942f563f96b3336bd: 8331732: [PPC64] Unify and optimize code which converts != 0 to 1
  • 53242cdf9ef17c502ebd541e84370e7c158639c1: 8335283: Build failure due to 'no_sanitize' attribute directive ignored
  • d9bcf061450ebfb7fe02b5a50c855db1d9178e5d: 8335217: Fix memory ordering in ClassLoaderData::ChunkedHandleList
  • bb18498d71dddf49db9bdfac886aed9ae123651d: 8335349: jcmd VM.classloaders "fold" option should be optional
  • 8350b1daedae8ef5785a7165e664b1d3149b18b7: 8335294: Fix simple -Wzero-as-null-pointer-constant warnings in gc code
  • 5d866bf17d96bd0f0e4545d7eee5912eda2e3a94: 8335252: Reduce size of j.u.Formatter.Conversion#isValid
  • ... and 1378 more: https://git.openjdk.org/jdk/compare/fb390d202c8bbbbb87ba48fd01387feb35a1b768...master

Your commit was automatically rebased without conflicts.

openjdk[bot] avatar Jul 01 '24 12:07 openjdk[bot]