datafusion-comet
datafusion-comet copied to clipboard
fix: Compute murmur3 hash with dictionary input correctly
Which issue does this PR close?
Closes #427
Rationale for this change
Bug fixes. When submitting #424, we found there's a bug in spark_hash, which doesn't handle dictionary array correctly. This PR tries to fix this first.
What changes are included in this PR?
- refactor some part of spark_hash.rs and be ready for xxhash64 support
- unpack dictionary when computing with hashes
- updated test
This PR currently depends on #426, will rebase once that's merged.
How are these changes tested?
Updated test with randomized input.
@viirya @kazuyukitanimura @sunchao PTAL when you have time.
Gently ping @viirya @sunchao and @andygrove