spark
spark copied to clipboard
[SPARK-39832][PYTHON] Support column arguments in regexp_replace
What changes were proposed in this pull request?
Support either literal Python strings or Column objects for the pattern and replacement arguments for regexp_replace.
Why are the changes needed?
Allows using different replacements per row, as in this example.
Does this PR introduce any user-facing change?
Users can now use regexp_replace with columns for all three arguments. The first argument (string that regex should be applied to) can be either a Column object or the string name of the column. In summary, the following signatures are supported:
regexp_replace("str", "\d", "")
regexp_replace(F.col("str"), "\d", "")
regexp_replace("str", F.col("pattern"), F.col("replacement"))
regexp_replace(F.col("str"), F.col("pattern"), F.col("replacement"))
How was this patch tested?
Added unit tests
cc @zero323 FYI
Any advice to solve the failing check?
Error: Unhandled error: Error: There was a new unsynced commit pushed. Please retrigger the workflow.
Can one of the admins verify this patch?
@physinet mind enabling https://github.com/physinet/spark/actions/workflows/build_main.yml and rebasing please? Apache Spark leverages the Github resources from the PR author's fork.
@HyukjinKwon @zero323 are you happy with the latest changes?
I am fine with this. I will leave it to @zero323
All comments are addressed; this should be ready for merge.
Merged into master.