spark
spark copied to clipboard
[SPARK-39883][SQL][TESTS] Add DataFrame function parity check
What changes were proposed in this pull request?
This PR adds a test that compares the available list of DataFrame functions in org.apache.spark.sql.functions with the SQL function registry. This attempts to verify that the DataFrame functions are a subset of the functions in the SQL function registry (subject to exclusions and expectations). It also produces a list of the differences between the two.
Why are the changes needed?
Currently there is no easy way to verify what the difference is between the two API's
Does this PR introduce any user-facing change?
No
How was this patch tested?
This PR is testing only
@cloud-fan or @MaxGekk can you review this or suggest who else might? For comparison see also https://github.com/apache/spark/pull/37144 for PySpark function parity check that is already merged.
looks fine to me, cc @HyukjinKwon @viirya
@viirya @HyukjinKwon I've addressed both of your comments including removing the printed report. I've also merged master and updated the expectations for one added function.
Merged to master.