spark
spark copied to clipboard
[SPARK-39895][SQL][PYTHON][R] Support multiple column drop
What changes were proposed in this pull request?
Pyspark dataframe drop has following signature:
def drop(self, *cols: "ColumnOrName") -> "DataFrame":
However when we try to pass multiple Column types to drop function it raises TypeError
each col in the param list should be a string
Minimal reproducible example:
values = [("id_1", 5, 9), ("id_2", 5, 1), ("id_3", 4, 3), ("id_1", 3, 3), ("id_2", 4, 3)]
df = spark.createDataFrame(values, "id string, point int, count int")
df.drop(df.point, df.count)
It spits out following:
/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py in drop(self, *cols)
2537 for col in cols:
2538 if not isinstance(col, str):
-> 2539 raise TypeError("each col in the param list should be a string")
2540 jdf = self._jdf.drop(self._jseq(cols))
2541
TypeError: each col in the param list should be a string
Why are the changes needed?
We expect that multiple columns can be handled by drop call on df because of its typing but that is not the case.
Does this PR introduce any user-facing change?
Yes, fixes issues related type confirmation in pyspark api
How was this patch tested?
Added missing tests for regression testing. CI Pipeline on fork and CI here will test them.
This PR seems to over-claim. Apache Spark already supports multi-column drop like the following. Please be more specific about your contribution.
>>> spark.version '3.2.2' >>> df = spark.createDataFrame([("A", 50, "Y"), ("B", 60, "Y")], ["name", "age", "active"]) >>> df.drop("name", "age").show() +------+ |active| +------+ | Y| | Y| +------+
@dongjoon-hyun JIRA ticket contains reproducible example. I will update the description on this PR for convenience! The patch is related to df.drop(Column*)
and not df.drop(str*)
.
Can one of the admins verify this patch?
Looks pretty good. Let me take a closer look tmr.
@zhengruifeng Thanks for your review, I have addressed your comments.
Looks pretty good. Let me take a closer look tmr.
@HyukjinKwon would you like to take another look?
Merged to master.
Thanks.
Thank you, @santosh-d3vpl3x , @HyukjinKwon , @zhengruifeng .