datafusion-comet
datafusion-comet copied to clipboard
bug: hash expression is not consistent with Spark
Describe the bug
Our hash
implementation does not produce the same results as Spark for some inputs.
I added this test to CometCastSuite
because that's where we have random data generators (we should move them into a common class that more test suites can use).
test("hash") {
val input = generateStrings(timestampPattern, 8).toDF("a")
withTempPath { dir =>
val data = roundtripParquet(input, dir).coalesce(1)
data.createOrReplaceTempView("t")
val df = spark.sql(s"select a, hash(a) from t order by a")
checkSparkAnswerAndOperator(df)
}
}
Example output:
!== Correct Answer - 1000 == == Spark Answer - 1000 ==
struct<a:string,hash(a):int> struct<a:string,hash(a):int>
![,142593372] [,0]
![ 099,-1611881412] [ 099,-881749019]
![ 1 474,240523873] [ 1 474,-1111423867]
![ 12852,-1057581169] [ 12852,-404859411]
![ 18,-492750382] [ 18,1333608017]
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response