spark
spark copied to clipboard
[SPARK-39938][PYTHON][PS] Accept all inputs of prefix/suffix which implement __str__ in add_predix/add_suffix
We need to follow the pandas behavior of prefix/suffix parameter validation in add_prefix/add_suffix.
Now, we force to validate it as a String type. But pandas looks all values which can be formated as String(implement str func). So it's different here.
What changes were proposed in this pull request?
We support all kind inputs which can be formated as string.
Why are the changes needed?
As pandas behavior is different with PySpark when we input other types into add_prefix/add_suffix funcs. PySpark
>>> from pyspark import pandas as ps
>>> df = ps.DataFrame({'A': [1, 2, 3, 4], 'B': [3, 4, 5, 6]}, columns=['A', 'B'])
>>> df.add_suffix(666)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/spark/spark/python/pyspark/pandas/frame.py", line 9060, in add_suffix
assert isinstance(suffix, str)
AssertionError
>>> df.add_suffix(True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/spark/spark/python/pyspark/pandas/frame.py", line 9060, in add_suffix
assert isinstance(suffix, str)
AssertionError
Pandas: 1.3.X/1.4.X
>>> pdf.add_suffix(0.1)
A0.1 B0.1
0 1 3
1 2 4
2 3 5
3 4 6
>>> pdf.add_suffix(True)
ATrue BTrue
0 1 3
1 2 4
2 3 5
3 4 6
Does this PR introduce any user-facing change?
No
How was this patch tested?
Input any can be stringable input into add_prefix/add_suffix funcs.
cc @xinrong-meng @itholic @zhengruifeng FYI
Can one of the admins verify this patch?
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!