[SPARK-47069][PYTHON][CONNECT] Introduce `spark.profile.show/dump` for SparkSession-based profiling
What changes were proposed in this pull request?
Introduce spark.profile.show/dump for SparkSession-based profiling for non-Spark-Connect.
Why are the changes needed?
SparkContext-based profiling has sc.dump_profiles/show_profiles for both perf and memory profiling.
Currently SparkSession-based has spark.dump/showPerfProfiles and spark.dump/showMemoryProfiles for perf and memory profiling separately.
It would be more consistent and user-friendly to consolidate them to a uniform interface as spark.profile.dump/show.
Does this PR introduce any user-facing change?
Yes. spark.profile.show/dump is supported, whereas (not-released yet) APIs below are removed
-spark.dumpPerfProfiles
-spark.dumpMemoryProfiles
-spark.showPerfProfiles
-spark.showMemoryProfiles
>>> spark.conf.set("spark.sql.pyspark.udf.profiler", "perf") # enable cProfiler
>>>
>>> @udf("string")
... def f(x):
... return str(x)
...
>>> df = spark.range(10).select(f(col("id")))
>>> df.collect()
[Row(f(id)='0'), ...]
>>> spark.profile.show()
============================================================
Profile of UDF<id=2>
============================================================
...
>>> spark.profile.show(type="memory")
>>> spark.profile.show(type="perf")
============================================================
Profile of UDF<id=2>
============================================================
...
>>> spark.profile.show(2, type="perf")
============================================================
Profile of UDF<id=2>
============================================================
...
>>> spark.profile.show(2, type="memory")
How was this patch tested?
Unit tests.
Was this patch authored or co-authored using generative AI tooling?
No.
Adjusted PR description.
Thanks! merging to master.
Thank you!