dr-elephant Initial changes for adding Spark stage analysis

Initial changes for adding Spark stage analysis

Open edwinalu opened this issue 6 years ago • 2 comments

Add StageAnalyzer for analyzing the stages for a Spark application for execution memory spill, long tasks, task skew, and failures.
Call REST API for getting failed tasks.
Modify call to stages REST API to get task and executor summaries.

Sep 21 '18 03:09 edwinalu

Some of the thresholds are meant to be set with heuristic configuration parameters. The stage analysis can be used for multiple heuristics (long task, task skew, execution memory spill, configuration parameter recommendations). Does it make sense to set these thresholds for each heuristic (and call StageAnalysis each time), or would it be better to consolidate? With the independent configuration parameters, users can decide which ones to use/include. However, keeping the values in sync across multiple heuristics seems awkward.

Perhaps this could be multi-level, with a general Spark (or Pig) configuration parameter list, which would kick in if there isn't a heuristic-level setting. This could still be confusing if misconfigured though.

Oct 16 '18 00:10 edwinalu

Dr-elephant Compare does not display information How to configure @edwinalu @chriseppstein

May 14 '19 02:05 fusonghe

dr-elephant dr-elephant copied to clipboard

Initial changes for adding Spark stage analysis

dr-elephant
dr-elephant copied to clipboard