spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-48549][SQL] Restrict the number of parameters for function `sentences` to `1` or `3`

Open panbingkun opened this issue 1 year ago • 5 comments

What changes were proposed in this pull request?

The pr aims to

  • restrict the number of parameters for function sentences to 1 or 3.
  • fix very weird dataset's schema when use the function sentences (will be showed below)
  • codegen support for sentences

Why are the changes needed?

  • Fix inconsistency in using the function sentences in the following scenarios image

    According to the definition of function sentences, we should only allow the following two kinds of parameter calls: A.sentences(str) B.sentences(str, language, country) - the parameters language and country either coexist or do not exist at the same time

    In file sql/core/src/main/scala/org/apache/spark/sql/functions.scala, only the following two functions are defined: https://github.com/apache/spark/blob/f4434c36cc4f7b0147e0e8fe26ac0f177a5199cd/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L4273-L4282

  • Fix very weird dataset's schema when use the function sentences image

Does this PR introduce any user-facing change?

Yes,

How was this patch tested?

  • Add new UT & Update existed UT.
  • Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

panbingkun avatar Jun 05 '24 12:06 panbingkun

After the pr:

  • When the number of the parameters is 2, an error message is prompted: image

  • dataset's head has been corrected: image image

panbingkun avatar Jun 06 '24 07:06 panbingkun

cc @HyukjinKwon @cloud-fan

panbingkun avatar Jun 06 '24 07:06 panbingkun

also cc @MaxGekk

panbingkun avatar Aug 14 '24 07:08 panbingkun

Do we have a strong reason to make this breaking change? How bad is it when calling sentences with two parameters?

cloud-fan avatar Aug 14 '24 17:08 cloud-fan

Do we have a strong reason to make this breaking change? How bad is it when calling sentences with two parameters?

Mainly because the inconsistent use of function sentences in the following scenarios may cause confusion for end-users image One allows 2 parameters, while the other does not.

panbingkun avatar Aug 27 '24 06:08 panbingkun

We shall not break existing SQL user's code only because it is inconsistent to Scala/Python APIs. As I said let's document the existing behaviour of SQL func first of all, and separately think of extending of other APIs.

MaxGekk avatar Sep 01 '24 09:09 MaxGekk

We shall not break existing SQL user's code only because it is inconsistent to Scala/Python APIs. As I said let's document the existing behaviour of SQL func first of all, and separately think of extending of other APIs.

Okay, I see what you mean, let me try to do it.

panbingkun avatar Sep 02 '24 05:09 panbingkun

We shall not break existing SQL user's code only because it is inconsistent to Scala/Python APIs. As I said let's document the existing behaviour of SQL func first of all, and separately think of extending of other APIs.

Done.

panbingkun avatar Sep 02 '24 08:09 panbingkun

@panbingkun I made slight modifications to your PR description for better readability:

- ``` -> ```scala
- Added ` to escape $"str"

Thank you for working on that!

xinrong-meng avatar Sep 06 '24 02:09 xinrong-meng

@panbingkun I made slight modifications to your PR description for better readability:

- ``` -> ```scala
- Added ` to escape $"str"

Thank you for working on that!

Thank you very much for your meticulous work! ❤️

panbingkun avatar Sep 06 '24 03:09 panbingkun

Could you review once more, @MaxGekk ?

dongjoon-hyun avatar Sep 11 '24 17:09 dongjoon-hyun

@dongjoon-hyun Thank you for the ping. Will do soon.

MaxGekk avatar Sep 11 '24 17:09 MaxGekk

Thank you so much, @MaxGekk !

dongjoon-hyun avatar Sep 11 '24 17:09 dongjoon-hyun

+1, LGTM. Merging to master. Thank you, @panbingkun and @cloud-fan @dongjoon-hyun @HyukjinKwon for review.

MaxGekk avatar Sep 11 '24 18:09 MaxGekk

Thank you again!

dongjoon-hyun avatar Sep 11 '24 19:09 dongjoon-hyun