[SPARK-48549][SQL] Restrict the number of parameters for function `sentences` to `1` or `3`
What changes were proposed in this pull request?
The pr aims to
restrictthenumberof parameters for functionsentencesto1or3.- fix very weird dataset's schema when use the function
sentences(will be showed below) codegensupport forsentences
Why are the changes needed?
-
Fix inconsistency in using the function
sentencesin the following scenariosAccording to the definition of function
sentences, we should only allow the following two kinds of parameter calls: A.sentences(str) B.sentences(str, language, country) - the parameterslanguageandcountryeither coexist or do not exist at the same timeIn file
sql/core/src/main/scala/org/apache/spark/sql/functions.scala, only the following two functions are defined: https://github.com/apache/spark/blob/f4434c36cc4f7b0147e0e8fe26ac0f177a5199cd/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L4273-L4282 -
Fix very weird dataset's schema when use the function
sentences
Does this PR introduce any user-facing change?
Yes,
How was this patch tested?
- Add new UT & Update existed UT.
- Pass GA.
Was this patch authored or co-authored using generative AI tooling?
No.
After the pr:
-
When the number of the parameters is
2, an error message is prompted: -
dataset's head has been corrected:
cc @HyukjinKwon @cloud-fan
also cc @MaxGekk
Do we have a strong reason to make this breaking change? How bad is it when calling sentences with two parameters?
Do we have a strong reason to make this breaking change? How bad is it when calling
sentenceswith two parameters?
Mainly because the inconsistent use of function sentences in the following scenarios may cause confusion for end-users
One allows 2 parameters, while the other does not.
We shall not break existing SQL user's code only because it is inconsistent to Scala/Python APIs. As I said let's document the existing behaviour of SQL func first of all, and separately think of extending of other APIs.
We shall not break existing SQL user's code only because it is inconsistent to Scala/Python APIs. As I said let's document the existing behaviour of SQL func first of all, and separately think of extending of other APIs.
Okay, I see what you mean, let me try to do it.
We shall not break existing SQL user's code only because it is inconsistent to Scala/Python APIs. As I said let's document the existing behaviour of SQL func first of all, and separately think of extending of other APIs.
Done.
@panbingkun I made slight modifications to your PR description for better readability:
- ``` -> ```scala
- Added ` to escape $"str"
Thank you for working on that!
@panbingkun I made slight modifications to your PR description for better readability:
- ``` -> ```scala - Added ` to escape $"str"Thank you for working on that!
Thank you very much for your meticulous work! ❤️
Could you review once more, @MaxGekk ?
@dongjoon-hyun Thank you for the ping. Will do soon.
Thank you so much, @MaxGekk !
+1, LGTM. Merging to master. Thank you, @panbingkun and @cloud-fan @dongjoon-hyun @HyukjinKwon for review.
Thank you again!