spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-38617][SQL][WEBUI] Show Spark rule and phase timings in SQL UI and REST API

Open yliou opened this issue 2 years ago • 12 comments

What changes were proposed in this pull request?

The changes proposed are to add a clickable field on SQL UI to show timing of spark phases and rule timing information from QueryPlanningTracker along with a separate REST endpoint to show the timing statistics. Adds a SQLConf to control the number of rules that show up on the UI and the REST endpoint call (default value is 15). UI Change: Screen Shot 2022-05-25 at 11 04 50 AM New REST API endpoint output: SPARK-38617RestOutput.txt

Why are the changes needed?

Currently user is unable to see Spark phase timing information or which Spark rules take the most time for a specific query execution on the UI. This PR adds that information.

Does this PR introduce any user-facing change?

Yes on the UI

How was this patch tested?

Tested locally and added a class to test the new endpoint.

yliou avatar Mar 22 '22 18:03 yliou

SPARK38617.txt is in rich text format. It would be easier to review if it was plain text.

martin-g avatar Mar 22 '22 19:03 martin-g

https://github.com/apache/spark/pull/35856 has tried the similar things although at present it miss the details of the compile time for each phase. And its scope also includes the AQE, I think is good to go.

ulysses-you avatar Mar 23 '22 01:03 ulysses-you

SPARK38617.txt is in rich text format. It would be easier to review if it was plain text.

Attached new file with plain text format.

yliou avatar Mar 23 '22 21:03 yliou

#35856 has tried the similar things although at present it miss the details of the compile time for each phase. And its scope also includes the AQE, I think is good to go.

What does good to go mean here as I don't see an approval at #35856? From what I see the changes at #35856 are different and complimentary with this PR as that PR adds information regarding AQE plan description changes and this PR has timing of spark phases and rule timing information. The changes in both PRs will help for troubleshooting.

yliou avatar Mar 24 '22 00:03 yliou

cc @jchen5

sigmod avatar Mar 24 '22 00:03 sigmod

@yliou I agree both can help for troubleshooting. And I have mentioned the difference, it miss the details of the compile time for each phase. That PR introduces a disk-based store so I'm thinking if can re-use the same code path. IMO it would be better to go with that PR first at least.

ulysses-you avatar Mar 24 '22 01:03 ulysses-you

+CC @shardulm94, @thejdeep

mridulm avatar Mar 24 '22 02:03 mridulm

Can we use a more meaningful JSON field names? E.g., value -> timeMs name -> phase

{
   “phase” : “optimization”,
   “timeMs” : “373"
 }, {
   “phase” : “analysis”,
   “timeMs” : “12"
 } ],
 “ruleInfo” : [ {
   “ruleName” : “org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation”,
   “timeMs” : “32.659566”,
   “numInvocations” : 5,
   “numEffectiveInvocations” : 3
 }, 
 ....
]

Also, can timeMs be a number instead of a string?

sigmod avatar Mar 24 '22 08:03 sigmod

@ulysses-you got it. I think it is possible to re-use the same code path but not sure how that'd work with ExecutionPage.scala to get the info on the UI.

yliou avatar Mar 24 '22 21:03 yliou

Can we use a more meaningful JSON field names? E.g., value -> timeMs name -> phase

{
   “phase” : “optimization”,
   “timeMs” : “373"
 }, {
   “phase” : “analysis”,
   “timeMs” : “12"
 } ],
 “ruleInfo” : [ {
   “ruleName” : “org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation”,
   “timeMs” : “32.659566”,
   “numInvocations” : 5,
   “numEffectiveInvocations” : 3
 }, 
 ....
]

Also, can timeMs be a number instead of a string?

Updated Json field names and changed timeMs to be a number

yliou avatar Mar 24 '22 23:03 yliou

@ulysses-you @yliou adding debug information is always good. But I'm worried about two issues:

  1. we should be careful when adding information to live UI's in-memory KV store. sooner or later, it will bring headaches
  2. we should be careful when adding new REST API endpoints. since two PRs both for troubleshooting, maybe we can use the same endpoint.

linhongliu-db avatar Mar 28 '22 02:03 linhongliu-db

Updated the PR to be more concise regarding the names of rules so that Spark event logs aren't as impacted as before in terms of size.

yliou avatar May 25 '22 19:05 yliou

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Dec 02 '22 00:12 github-actions[bot]

@dependabot reopen

yliou avatar Apr 04 '23 17:04 yliou

Hi @ulysses-you @linhongliu-db now that https://github.com/apache/spark/pull/35856 got reverted and https://github.com/apache/spark/pull/38567 got merged, should I create another pull request for this feature to try to get it merged?

yliou avatar Jun 12 '23 21:06 yliou