[SPARK-48605][CORE][UI] UI display driver thread first on the thread dump page
What changes were proposed in this pull request?
This PR aims to display the driver thread first on the thread dump page by defining a custom threadInfoOrdering.
Why are the changes needed?
Currently, there is an ordering rule to display the task threads first on the executor thread dump page, it does improve the user experience in troubleshooting "task stuck" issues. There are a lot of similar stuck issues on the driver's side too. For example, I hit two frequent cases in my daily support:
- Spark Driver is blocked on the user code that is written in the "main function".
- When NameNode has high pressures, Spark Driver is blocked on HDFS calls.
In YARN cluster mode, the "main function" runs in "Driver" thread, in other modes like Local, K8s, YARN client mode, the "main function" runs in "main" thread. So I think displaying these two threads first should improve UX in troubleshooting "driver stuck" issues.
Does this PR introduce any user-facing change?
Yes, it affects the default thread's display order on the thread dump page.
How was this patch tested?
UT:
build/sbt "core/testOnly org.apache.spark.util.UtilsSuite -- -z ThreadInfoOrdering"
Manually tested on YARN cluster mode:
Was this patch authored or co-authored using generative AI tooling?
No.
For YARN cluster mode, thread name "Driver" is set at https://github.com/apache/spark/blob/b5e1b7988031044d3cbdb277668b775c08db1a74/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L773
cc @yaooqinn @LuciferYang
cc @yaooqinn
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!