spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-39832][PYTHON] Support column arguments in regexp_replace

Open physinet opened this issue 3 years ago • 6 comments

What changes were proposed in this pull request?

Support either literal Python strings or Column objects for the pattern and replacement arguments for regexp_replace.

Why are the changes needed?

Allows using different replacements per row, as in this example.

Does this PR introduce any user-facing change?

Users can now use regexp_replace with columns for all three arguments. The first argument (string that regex should be applied to) can be either a Column object or the string name of the column. In summary, the following signatures are supported:

regexp_replace("str", "\d", "")
regexp_replace(F.col("str"), "\d", "")
regexp_replace("str", F.col("pattern"), F.col("replacement"))
regexp_replace(F.col("str"), F.col("pattern"), F.col("replacement"))

How was this patch tested?

Added unit tests

physinet avatar Jul 28 '22 12:07 physinet

cc @zero323 FYI

HyukjinKwon avatar Jul 29 '22 02:07 HyukjinKwon

Any advice to solve the failing check?

Error: Unhandled error: Error: There was a new unsynced commit pushed. Please retrigger the workflow.

physinet avatar Jul 29 '22 12:07 physinet

Can one of the admins verify this patch?

AmplabJenkins avatar Jul 29 '22 15:07 AmplabJenkins

@physinet mind enabling https://github.com/physinet/spark/actions/workflows/build_main.yml and rebasing please? Apache Spark leverages the Github resources from the PR author's fork.

HyukjinKwon avatar Aug 02 '22 04:08 HyukjinKwon

@HyukjinKwon @zero323 are you happy with the latest changes?

physinet avatar Aug 04 '22 13:08 physinet

I am fine with this. I will leave it to @zero323

HyukjinKwon avatar Aug 05 '22 00:08 HyukjinKwon

All comments are addressed; this should be ready for merge.

physinet avatar Aug 19 '22 22:08 physinet

Merged into master.

zero323 avatar Aug 20 '22 15:08 zero323