spark
spark copied to clipboard
[SPARK-38617][SQL][WEBUI] Show Spark rule and phase timings in SQL UI and REST API
What changes were proposed in this pull request?
The changes proposed are to add a clickable field on SQL UI to show timing of spark phases and rule timing information from QueryPlanningTracker along with a separate REST endpoint to show the timing statistics. Adds a SQLConf to control the number of rules that
show up on the UI and the REST endpoint call (default value is 15).
UI Change:
New REST API endpoint output:
SPARK-38617RestOutput.txt
Why are the changes needed?
Currently user is unable to see Spark phase timing information or which Spark rules take the most time for a specific query execution on the UI. This PR adds that information.
Does this PR introduce any user-facing change?
Yes on the UI
How was this patch tested?
Tested locally and added a class to test the new endpoint.
SPARK38617.txt is in rich text format. It would be easier to review if it was plain text.
https://github.com/apache/spark/pull/35856 has tried the similar things although at present it miss the details of the compile time for each phase. And its scope also includes the AQE, I think is good to go.
SPARK38617.txt is in rich text format. It would be easier to review if it was plain text.
Attached new file with plain text format.
#35856 has tried the similar things although at present it miss the details of the compile time for each phase. And its scope also includes the AQE, I think is good to go.
What does good to go mean here as I don't see an approval at #35856? From what I see the changes at #35856 are different and complimentary with this PR as that PR adds information regarding AQE plan description changes and this PR has timing of spark phases and rule timing information. The changes in both PRs will help for troubleshooting.
cc @jchen5
@yliou I agree both can help for troubleshooting. And I have mentioned the difference, it miss the details of the compile time for each phase
. That PR introduces a disk-based store so I'm thinking if can re-use the same code path. IMO it would be better to go with that PR first at least.
+CC @shardulm94, @thejdeep
Can we use a more meaningful JSON field names? E.g.,
value
-> timeMs
name
-> phase
{
“phase” : “optimization”,
“timeMs” : “373"
}, {
“phase” : “analysis”,
“timeMs” : “12"
} ],
“ruleInfo” : [ {
“ruleName” : “org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation”,
“timeMs” : “32.659566”,
“numInvocations” : 5,
“numEffectiveInvocations” : 3
},
....
]
Also, can timeMs
be a number instead of a string?
@ulysses-you got it. I think it is possible to re-use the same code path but not sure how that'd work with ExecutionPage.scala to get the info on the UI.
Can we use a more meaningful JSON field names? E.g.,
value
->timeMs
name
->phase
{ “phase” : “optimization”, “timeMs” : “373" }, { “phase” : “analysis”, “timeMs” : “12" } ], “ruleInfo” : [ { “ruleName” : “org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation”, “timeMs” : “32.659566”, “numInvocations” : 5, “numEffectiveInvocations” : 3 }, .... ]
Also, can
timeMs
be a number instead of a string?
Updated Json field names and changed timeMs
to be a number
@ulysses-you @yliou adding debug information is always good. But I'm worried about two issues:
- we should be careful when adding information to live UI's in-memory KV store. sooner or later, it will bring headaches
- we should be careful when adding new REST API endpoints. since two PRs both for troubleshooting, maybe we can use the same endpoint.
Updated the PR to be more concise regarding the names of rules so that Spark event logs aren't as impacted as before in terms of size.
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!
@dependabot reopen
Hi @ulysses-you @linhongliu-db now that https://github.com/apache/spark/pull/35856 got reverted and https://github.com/apache/spark/pull/38567 got merged, should I create another pull request for this feature to try to get it merged?