[Enhancement] use xxh3_hash for exchange to improve performance
Why I'm doing:
Why replacing fnv_hash with xxh3 ?
- Based on the benchmark results from smhasher,
xxh3is among the fastest and most secure hash algorithms, significantly outperforming FNV in speed. - xxhash offers multiple variants including
XXH32_family,XXH64_family, andXXH3_family; here, we have selectedxxh3_64, implemented by Cyan4973
| Hash Name | ISA ext | Width | Large Data Speed | Small Data Velocity |
|---|---|---|---|---|
| XXH3_64bits() | @b AVX2 | 64 | 59.4 GB/s | 133.1 |
| MeowHash | AES-NI | 128 | 58.2 GB/s | 52.5 |
| XXH3_128bits() | @b AVX2 | 128 | 57.9 GB/s | 118.1 |
| CLHash | PCLMUL | 64 | 37.1 GB/s | 58.1 |
| XXH3_64bits() | @b SSE2 | 64 | 31.5 GB/s | 133.1 |
| XXH3_128bits() | @b SSE2 | 128 | 29.6 GB/s | 118.1 |
| RAM sequential read | N/A | 28.0 GB/s | N/A | |
| ahash | AES-NI | 64 | 22.5 GB/s | 107.2 |
| City64 | 64 | 22.0 GB/s | 76.6 | |
| T1ha2 | 64 | 22.0 GB/s | 99.0 | |
| City128 | 128 | 21.7 GB/s | 57.7 | |
| FarmHash | AES-NI | 64 | 21.3 GB/s | 71.9 |
| XXH64() | 64 | 19.4 GB/s | 71.0 | |
| SpookyHash | 64 | 19.3 GB/s | 53.2 | |
| Mum | 64 | 18.0 GB/s | 67.0 | |
| CRC32C | SSE4.2 | 32 | 13.0 GB/s | 57.9 |
| XXH32() | 32 | 9.7 GB/s | 71.9 | |
| City32 | 32 | 9.1 GB/s | 66.0 | |
| Blake3* | @b AVX2 | 256 | 4.4 GB/s | 8.1 |
| Murmur3 | 32 | 3.9 GB/s | 56.1 | |
| SipHash* | 64 | 3.0 GB/s | 43.2 | |
| Blake3* | @b SSE2 | 256 | 2.4 GB/s | 8.1 |
| HighwayHash | 64 | 1.4 GB/s | 6.0 | |
| FNV64 | 64 | 1.2 GB/s | 62.7 | |
| Blake2* | 256 | 1.1 GB/s | 5.1 | |
| SHA1* | 160 | 0.8 GB/s | 5.6 | |
| MD5* | 128 | 0.6 GB/s | 7.8 |
What I'm doing:
Introduce a new session variable exchange_hash_function_version to manage the hash function utilized in exchange shuffle operations:
- 0:
fnv_hash(default, ensuring backward compatibility) - 1:
xxh3_hash(significantly faster, offering ~2x improvement on small datasets and ~50x on large datasets)
Key Updates:
- Extend the
Columnbase class by adding thexxh3_hashinterface, implementing it across all column types (e.g., fixed_length, binary, nullable, const, array, map, struct, json, object, etc.). - Integrate
exchange_hash_function_versionintoTQueryOptionswithin Thrift. - Define the session variable in
SessionVariable.java(Frontend). - Update
ExchangeSinkOperatorto utilizexxh3_hashwhen the version is set to 1. - Introduce a "HashFunction" metric in the exchange sink profile for monitoring purposes.
Compatibility
- Rolling Upgrade: If the Backend encounters an unrecognized hash function version, it will default to
fnv_hashto maintain seamless operation. - Exchange Hash-Shuffle: Changed from FNV to XXH3
- Exchange Bucket-Shuffle: OLAP table
DISTRIBUTED BY HASHuses crc32. The exchange hash remains consistent to ensure data compatibility. - Local-Exchange: Changed from FNV to XXH3
- RuntimeFilter-Shuffle: Changed from FNV to XXH3
Fixes #issue
What type of PR is this:
- [ ] BugFix
- [ ] Feature
- [x] Enhancement
- [ ] Refactor
- [ ] UT
- [ ] Doc
- [ ] Tool
Does this PR entail a change in behavior?
- [ ] Yes, this PR will result in a change in behavior.
- [x] No, this PR will not result in a change in behavior.
If yes, please specify the type of change:
- [ ] Interface/UI changes: syntax, type conversion, expression evaluation, display information
- [ ] Parameter changes: default values, similar parameters but with different default values
- [ ] Policy changes: use new policy to replace old one, functionality automatically enabled
- [ ] Feature removed
- [ ] Miscellaneous: upgrade & downgrade compatibility, etc.
Checklist:
- [ ] I have added test cases for my bug fix or my new feature
- [ ] This pr needs user documentation (for new or modified features or behaviors)
- [ ] I have added documentation for my new feature or new function
- [ ] This is a backport pr
Bugfix cherry-pick branch check:
- [x] I have checked the version labels which the pr will be auto-backported to the target branch
- [ ] 4.0
- [ ] 3.5
- [ ] 3.4
- [ ] 3.3
[!NOTE] Adds xxh3 hash for exchange/local shuffle and runtime filters, selectable via new session variable/query option, with FNV as fallback.
- Hashing & Config
- Add
exchange_hash_function_version(FE session var; ThriftTQueryOptions.exchange_hash_function_version) to choose hash (0=fnv, 1=xxh3).HashUtil: addXXH3_SEED/XXH3_SEED_32.- Column API
- Add
Column::{xxh3_hash, xxh3_hash_with_selection, xxh3_hash_selective}and implementations.- Exchange/Shuffles
ExchangeSinkOperatorand local exchangers (ShufflePartitioner,KeyPartitionExchanger): compute partition hashes withxxh3when version=1; otherwisefnv. Add profile infoHashFunction.- Runtime Filters
- Extend runtime filter hashing to support
xxh3viaRunningContext.exchange_hash_function_version; update iterators and partition-index computations; propagate version through probe descriptors and predicates.- Thrift/FE
- Thrift: add
exchange_hash_function_versiontoTQueryOptions.- FE
SessionVariable: defineEXCHANGE_HASH_FUNCTION_VERSION(default 1) and set into Thrift.- Tests
- Add SQL tests
test_exchange_hash_function_versionvalidating behavior under versions 0 and 1.Written by Cursor Bugbot for commit 4448efbb3b272d2ab54a3089ed34926a3b3627c7. This will update automatically on new commits. Configure here.
๐งช CI Insights
Here's what we observed from your CI run for 4448efbb.
๐ข All jobs passed!
But CI Insights is watching ๐
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@cursor review
@mergifyio rebase
rebase
โ Base branch update has failed
Git reported the following error:
Rebasing (1/6)
Auto-merging be/src/column/column.cpp
Auto-merging be/src/column/column.h
Auto-merging be/src/exec/pipeline/exchange/exchange_sink_operator.cpp
Auto-merging be/src/exec/pipeline/exchange/exchange_sink_operator.h
Auto-merging be/src/exprs/runtime_filter_bank.cpp
Auto-merging fe/fe-core/src/main/java/com/starrocks/qe/SessionVariable.java
Auto-merging gensrc/thrift/InternalService.thrift
CONFLICT (content): Merge conflict in gensrc/thrift/InternalService.thrift
error: could not apply e20d71a309... cursor draft
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply e20d71a309... cursor draft
Quality Gate passed
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
[Java-Extensions Incremental Coverage Report]
:white_check_mark: pass : 0 / 0 (0%)
[FE Incremental Coverage Report]
:white_check_mark: pass : 2 / 2 (100.00%)
file detail
| path | covered_line | new_line | coverage | not_covered_line_detail | |
|---|---|---|---|---|---|
| :large_blue_circle: | com/starrocks/qe/SessionVariable.java | 2 | 2 | 100.00% | [] |
[BE Incremental Coverage Report]
:white_check_mark: pass : 69 / 78 (88.46%)
file detail
| path | covered_line | new_line | coverage | not_covered_line_detail | |
|---|---|---|---|---|---|
| :large_blue_circle: | be/src/exprs/runtime_filter_bank.cpp | 5 | 7 | 71.43% | [607, 894] |
| :large_blue_circle: | be/src/exprs/runtime_filter.h | 16 | 20 | 80.00% | [267, 270, 1365, 1390] |
| :large_blue_circle: | be/src/exec/pipeline/exchange/local_exchange.cpp | 20 | 23 | 86.96% | [95, 96, 97] |
| :large_blue_circle: | be/src/exec/pipeline/exchange/exchange_sink_operator.cpp | 10 | 10 | 100.00% | [] |
| :large_blue_circle: | be/src/storage/runtime_filter_predicate.cpp | 7 | 7 | 100.00% | [] |
| :large_blue_circle: | be/src/exec/pipeline/exchange/local_exchange.h | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | be/src/exprs/runtime_filter_bank.h | 1 | 1 | 100.00% | [] |
| :large_blue_circle: | be/src/column/column.cpp | 9 | 9 | 100.00% | [] |