bitcoin
bitcoin copied to clipboard
Add 1-way SSE4 SHA256 implementation using intrinsics for MSVC builds
This PR reintroduces the 1-way SSE4 SHA256 implementation using intrinsics, as suggested in https://github.com/bitcoin/bitcoin/pull/13442, specifically for MSVC builds, where a 50% performance gain has been achieved.
Here are benchmarks on my machine with Intel Core i5-8350U CPU (no sha_ni
flag) + Windows 11 Pro 22H2:
- before this PR (8a9e37fb95cbb0bf7f6e06fa05d8381db04d61e2):
>.\src\bench_bitcoin.exe -filter=SHA256_.*
| ns/byte | byte/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 9.92 | 100,826,852.23 | 0.1% | 0.01 | SHA256_32b_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation
| 9.90 | 101,038,141.67 | 0.3% | 0.01 | SHA256_32b_SHANI using the 'standard,sse41(4way)' SHA256 implementation
| 10.02 | 99,788,852.31 | 0.9% | 0.01 | SHA256_32b_SSE4 using the 'standard,sse41(4way)' SHA256 implementation
| 10.01 | 99,883,509.98 | 0.8% | 0.01 | SHA256_32b_STANDARD using the 'standard' SHA256 implementation
| 4.48 | 223,348,893.31 | 1.1% | 0.05 | SHA256_AVX2 using the 'standard,sse41(4way),avx2(8way)' SHA256 implementation
| 4.47 | 223,668,612.58 | 1.2% | 0.05 | SHA256_SHANI using the 'standard,sse41(4way)' SHA256 implementation
| 4.45 | 224,638,332.29 | 0.7% | 0.05 | SHA256_SSE4 using the 'standard,sse41(4way)' SHA256 implementation
| 4.45 | 224,542,494.67 | 0.6% | 0.05 | SHA256_STANDARD using the 'standard' SHA256 implementation
- with this PR:
>.\src\bench_bitcoin.exe -filter=SHA256_.*
| ns/byte | byte/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| 7.04 | 142,024,691.36 | 0.2% | 0.01 | SHA256_32b_AVX2 using the 'sse41(1way),sse41(4way),avx2(8way)' SHA256 implementation
| 7.03 | 142,222,222.22 | 0.2% | 0.01 | SHA256_32b_SHANI using the 'sse41(1way),sse41(4way)' SHA256 implementation
| 7.08 | 141,231,323.51 | 0.8% | 0.01 | SHA256_32b_SSE4 using the 'sse41(1way),sse41(4way)' SHA256 implementation
| 9.88 | 101,196,866.84 | 0.4% | 0.01 | SHA256_32b_STANDARD using the 'standard' SHA256 implementation
| 3.01 | 332,270,069.11 | 1.3% | 0.03 | SHA256_AVX2 using the 'sse41(1way),sse41(4way),avx2(8way)' SHA256 implementation
| 3.00 | 332,989,244.45 | 0.3% | 0.03 | SHA256_SHANI using the 'sse41(1way),sse41(4way)' SHA256 implementation
| 3.04 | 328,612,270.38 | 2.0% | 0.03 | SHA256_SSE4 using the 'sse41(1way),sse41(4way)' SHA256 implementation
| 4.45 | 224,678,709.45 | 0.4% | 0.05 | SHA256_STANDARD using the 'standard' SHA256 implementation
Based on https://github.com/bitcoin/bitcoin/pull/24773.
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.
Code Coverage
For detailed information about the code coverage, see the test coverage report.
Reviews
See the guideline for information on the review process. A summary of reviews will appear here.
Conflicts
Reviewers, this pull request conflicts with the following ones:
- #29774 (build: Enable fuzz binary in MSVC by hebasto)
- #29625 (Several randomness improvements by sipa)
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.
Rebased on top of the merged #27598.
🐙 This pull request conflicts with the target branch and needs rebase.
Based on #24773.
Deferring to after cmake.