spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-39938][PYTHON][PS] Accept all inputs of prefix/suffix which implement __str__ in add_predix/add_suffix

Open bzhaoopenstack opened this issue 2 years ago • 2 comments

We need to follow the pandas behavior of prefix/suffix parameter validation in add_prefix/add_suffix.

Now, we force to validate it as a String type. But pandas looks all values which can be formated as String(implement str func). So it's different here.

What changes were proposed in this pull request?

We support all kind inputs which can be formated as string.

Why are the changes needed?

As pandas behavior is different with PySpark when we input other types into add_prefix/add_suffix funcs. PySpark

>>> from pyspark import pandas as ps
>>> df = ps.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]}, columns=['A', 'B'])
>>> df.add_suffix(666)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/spark/spark/python/pyspark/pandas/frame.py", line 9060, in add_suffix
    assert isinstance(suffix, str)
AssertionError
>>> df.add_suffix(True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/spark/spark/python/pyspark/pandas/frame.py", line 9060, in add_suffix
    assert isinstance(suffix, str)
AssertionError

Pandas: 1.3.X/1.4.X

>>> pdf.add_suffix(0.1)
   A0.1  B0.1
0     1     3
1     2     4
2     3     5
3     4     6
>>> pdf.add_suffix(True)
   ATrue  BTrue
0      1      3
1      2      4
2      3      5
3      4      6

Does this PR introduce any user-facing change?

No

How was this patch tested?

Input any can be stringable input into add_prefix/add_suffix funcs.

bzhaoopenstack avatar Aug 02 '22 01:08 bzhaoopenstack

cc @xinrong-meng @itholic @zhengruifeng FYI

HyukjinKwon avatar Aug 02 '22 03:08 HyukjinKwon

Can one of the admins verify this patch?

AmplabJenkins avatar Aug 02 '22 20:08 AmplabJenkins

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Nov 11 '22 00:11 github-actions[bot]