spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-39895][SQL][PYTHON][R] Support multiple column drop

Open santosh-d3vpl3x opened this issue 2 years ago • 5 comments

What changes were proposed in this pull request?

Pyspark dataframe drop has following signature: def drop(self, *cols: "ColumnOrName") -> "DataFrame":

However when we try to pass multiple Column types to drop function it raises TypeError

each col in the param list should be a string

Minimal reproducible example:

values = [("id_1", 5, 9), ("id_2", 5, 1), ("id_3", 4, 3), ("id_1", 3, 3), ("id_2", 4, 3)]
df = spark.createDataFrame(values, "id string, point int, count int")
df.drop(df.point, df.count)

It spits out following:

/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py in drop(self, *cols)
2537 for col in cols:
2538 if not isinstance(col, str):
-> 2539 raise TypeError("each col in the param list should be a string")
2540 jdf = self._jdf.drop(self._jseq(cols))
2541

TypeError: each col in the param list should be a string

Why are the changes needed?

We expect that multiple columns can be handled by drop call on df because of its typing but that is not the case.

Does this PR introduce any user-facing change?

Yes, fixes issues related type confirmation in pyspark api

How was this patch tested?

Added missing tests for regression testing. CI Pipeline on fork and CI here will test them.

santosh-d3vpl3x avatar Jul 28 '22 19:07 santosh-d3vpl3x

This PR seems to over-claim. Apache Spark already supports multi-column drop like the following. Please be more specific about your contribution.

>>> spark.version
'3.2.2'
>>> df = spark.createDataFrame([("A", 50, "Y"), ("B", 60, "Y")], ["name", "age", "active"])
>>> df.drop("name", "age").show()
+------+
|active|
+------+
|     Y|
|     Y|
+------+

@dongjoon-hyun JIRA ticket contains reproducible example. I will update the description on this PR for convenience! The patch is related to df.drop(Column*) and not df.drop(str*).

santosh-d3vpl3x avatar Jul 29 '22 08:07 santosh-d3vpl3x

Can one of the admins verify this patch?

AmplabJenkins avatar Jul 29 '22 09:07 AmplabJenkins

Looks pretty good. Let me take a closer look tmr.

HyukjinKwon avatar Jul 31 '22 05:07 HyukjinKwon

@zhengruifeng Thanks for your review, I have addressed your comments.

santosh-d3vpl3x avatar Aug 08 '22 14:08 santosh-d3vpl3x

Looks pretty good. Let me take a closer look tmr.

@HyukjinKwon would you like to take another look?

zhengruifeng avatar Aug 09 '22 03:08 zhengruifeng

Merged to master.

Thanks.

HyukjinKwon avatar Aug 11 '22 03:08 HyukjinKwon

Thank you, @santosh-d3vpl3x , @HyukjinKwon , @zhengruifeng .

dongjoon-hyun avatar Aug 11 '22 04:08 dongjoon-hyun