spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-48605][CORE][UI] UI display driver thread first on the thread dump page

Open pan3793 opened this issue 1 year ago • 3 comments

What changes were proposed in this pull request?

This PR aims to display the driver thread first on the thread dump page by defining a custom threadInfoOrdering.

Why are the changes needed?

Currently, there is an ordering rule to display the task threads first on the executor thread dump page, it does improve the user experience in troubleshooting "task stuck" issues. There are a lot of similar stuck issues on the driver's side too. For example, I hit two frequent cases in my daily support:

  1. Spark Driver is blocked on the user code that is written in the "main function".
  2. When NameNode has high pressures, Spark Driver is blocked on HDFS calls.

In YARN cluster mode, the "main function" runs in "Driver" thread, in other modes like Local, K8s, YARN client mode, the "main function" runs in "main" thread. So I think displaying these two threads first should improve UX in troubleshooting "driver stuck" issues.

Does this PR introduce any user-facing change?

Yes, it affects the default thread's display order on the thread dump page.

How was this patch tested?

UT:

build/sbt "core/testOnly org.apache.spark.util.UtilsSuite -- -z ThreadInfoOrdering"

Manually tested on YARN cluster mode: Xnip2024-06-12_22-38-25 Xnip2024-06-12_22-39-25

Was this patch authored or co-authored using generative AI tooling?

No.

pan3793 avatar Jun 12 '24 15:06 pan3793

For YARN cluster mode, thread name "Driver" is set at https://github.com/apache/spark/blob/b5e1b7988031044d3cbdb277668b775c08db1a74/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L773

pan3793 avatar Jun 12 '24 15:06 pan3793

cc @yaooqinn @LuciferYang

pan3793 avatar Jun 13 '24 03:06 pan3793

cc @yaooqinn

LuciferYang avatar Jun 14 '24 08:06 LuciferYang

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Sep 23 '24 00:09 github-actions[bot]