datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

fix: Compute murmur3 hash with dictionary input correctly

Open advancedxy opened this issue 1 year ago • 2 comments

Which issue does this PR close?

Closes #427

Rationale for this change

Bug fixes. When submitting #424, we found there's a bug in spark_hash, which doesn't handle dictionary array correctly. This PR tries to fix this first.

What changes are included in this PR?

  1. refactor some part of spark_hash.rs and be ready for xxhash64 support
  2. unpack dictionary when computing with hashes
  3. updated test

This PR currently depends on #426, will rebase once that's merged.

How are these changes tested?

Updated test with randomized input.

advancedxy avatar May 15 '24 12:05 advancedxy

@viirya @kazuyukitanimura @sunchao PTAL when you have time.

advancedxy avatar May 16 '24 13:05 advancedxy

Gently ping @viirya @sunchao and @andygrove

advancedxy avatar May 20 '24 02:05 advancedxy