bitcoin icon indicating copy to clipboard operation
bitcoin copied to clipboard

Add 1-way SSE4 SHA256 implementation using intrinsics for MSVC builds

Open hebasto opened this issue 1 year ago • 2 comments

This PR reintroduces the 1-way SSE4 SHA256 implementation using intrinsics, as suggested in https://github.com/bitcoin/bitcoin/pull/13442, specifically for MSVC builds, where a 50% performance gain has been achieved.

Here are benchmarks on my machine with Intel Core i5-8350U CPU (no sha_ni flag) + Windows 11 Pro 22H2:

  • before this PR (8a9e37fb95cbb0bf7f6e06fa05d8381db04d61e2):
>.\src\bench_bitcoin.exe -filter=SHA256_.*

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                9.92 |      100,826,852.23 |    0.1% |      0.01 | SHA256_32b_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation
|                9.90 |      101,038,141.67 |    0.3% |      0.01 | SHA256_32b_SHANI using the 'standard,sse41(4way)' SHA256 implementation
|               10.02 |       99,788,852.31 |    0.9% |      0.01 | SHA256_32b_SSE4 using the 'standard,sse41(4way)' SHA256 implementation
|               10.01 |       99,883,509.98 |    0.8% |      0.01 | SHA256_32b_STANDARD using the 'standard' SHA256 implementation
|                4.48 |      223,348,893.31 |    1.1% |      0.05 | SHA256_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation
|                4.47 |      223,668,612.58 |    1.2% |      0.05 | SHA256_SHANI using the 'standard,sse41(4way)' SHA256 implementation
|                4.45 |      224,638,332.29 |    0.7% |      0.05 | SHA256_SSE4 using the 'standard,sse41(4way)' SHA256 implementation
|                4.45 |      224,542,494.67 |    0.6% |      0.05 | SHA256_STANDARD using the 'standard' SHA256 implementation
  • with this PR:
>.\src\bench_bitcoin.exe -filter=SHA256_.*

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                7.04 |      142,024,691.36 |    0.2% |      0.01 | SHA256_32b_AVX2 using the 'sse41(1way),sse41(4way),avx2(8way)' SHA256 implementation
|                7.03 |      142,222,222.22 |    0.2% |      0.01 | SHA256_32b_SHANI using the 'sse41(1way),sse41(4way)' SHA256 implementation
|                7.08 |      141,231,323.51 |    0.8% |      0.01 | SHA256_32b_SSE4 using the 'sse41(1way),sse41(4way)' SHA256 implementation
|                9.88 |      101,196,866.84 |    0.4% |      0.01 | SHA256_32b_STANDARD using the 'standard' SHA256 implementation
|                3.01 |      332,270,069.11 |    1.3% |      0.03 | SHA256_AVX2 using the 'sse41(1way),sse41(4way),avx2(8way)' SHA256 implementation
|                3.00 |      332,989,244.45 |    0.3% |      0.03 | SHA256_SHANI using the 'sse41(1way),sse41(4way)' SHA256 implementation
|                3.04 |      328,612,270.38 |    2.0% |      0.03 | SHA256_SSE4 using the 'sse41(1way),sse41(4way)' SHA256 implementation
|                4.45 |      224,678,709.45 |    0.4% |      0.05 | SHA256_STANDARD using the 'standard' SHA256 implementation

Based on https://github.com/bitcoin/bitcoin/pull/24773.

hebasto avatar Sep 24 '23 14:09 hebasto

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process. A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29774 (build: Enable fuzz binary in MSVC by hebasto)
  • #29625 (Several randomness improvements by sipa)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot avatar Sep 24 '23 14:09 DrahtBot

Rebased on top of the merged #27598.

hebasto avatar Oct 04 '23 15:10 hebasto

🐙 This pull request conflicts with the target branch and needs rebase.

DrahtBot avatar Apr 28 '24 03:04 DrahtBot

Based on #24773.

Deferring to after cmake.

hebasto avatar Apr 28 '24 05:04 hebasto