[VL] Diff on parquet agg
Backend
VL (Velox)
Bug description
spark.sql("set spark.gluten.enabled=false")
spark.range(100).selectExpr("id%2 as c1", "id%5 as c2", "id as c3").write.mode("overwrite").parquet("tmp/t1")
spark.sql("set spark.gluten.enabled=true")
spark.read.parquet("tmp/t1").createOrReplaceTempView("t1")
spark.sql("select c2, sum(c3) from t1 where c1= 1 group by c2").show
result
+---+---------------+
| c2| sum(c3)|
+---+---------------+
| 0|559882429285360|
| 1|559885503421750|
| 3|839826576815406|
| 2|839827141809990|
| 4|559885785918562|
+---+---------------+
I tested three versions. The velox-08-27 version is normal, but the velox-10-11 and velox-10-04 versions are abnormal.
Spark version
Spark-3.4.x
Spark configurations
No response
System information
No response
Relevant logs
No response
cc @rui-mo @FelixYBW @zhztheplayer Have you encountered similar problems?
No. Looks like a new Velox bug. Would you debug it?
No. Looks like a new Velox bug. Would you debug it?
Sorry, there are problems with local mac compilation of new version, I can’t debug it for the time being.
One more thing, I cannot reproduce on mac.
Through testing, found that https://github.com/facebookincubator/velox/pull/11010 caused, it worked after reverted it.
Thank you for update. Did you submit an issue to Velox?
Thank you for update. Did you submit an issue to Velox?
https://github.com/facebookincubator/velox/issues/11257
Resolved by https://github.com/facebookincubator/velox/pull/12176