spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-48719][SQL] Fix the calculation bug of `RegrSlope` & `RegrIntercept` when the first parameter is null

Open wayneguow opened this issue 1 year ago • 2 comments

What changes were proposed in this pull request?

This PR aims to fix the calculation bug of RegrSlope&RegrIntercept` when the first parameter is null. Regardless of whether the first parameter(y) or the second parameter(x) is null, this tuple should be filtered out.

Why are the changes needed?

Fix bug.

Does this PR introduce any user-facing change?

Yes, the calculation changes when the first value of a tuple is null, but the value is truly correct.

How was this patch tested?

Pass GA and test with build/sbt "~sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z linear-regression.sql"

Was this patch authored or co-authored using generative AI tooling?

No.

wayneguow avatar Jun 26 '24 16:06 wayneguow

cc @beliefer

wayneguow avatar Jun 27 '24 06:06 wayneguow

Gentle ping @HyukjinKwon, when you have time.

wayneguow avatar Jun 28 '24 07:06 wayneguow

seems ok, cc @beliefer and @cloud-fan

LuciferYang avatar Jul 03 '24 07:07 LuciferYang

thanks, merging to master!

cloud-fan avatar Jul 05 '24 14:07 cloud-fan

can you open a backport PR for 3.5?

cloud-fan avatar Jul 05 '24 14:07 cloud-fan

can you open a backport PR for 3.5?

Ok, let me do it. Thank you for review. @cloud-fan

wayneguow avatar Jul 05 '24 14:07 wayneguow