dr-elephant icon indicating copy to clipboard operation
dr-elephant copied to clipboard

Initial changes for adding Spark stage analysis

Open edwinalu opened this issue 6 years ago • 2 comments

  • Add StageAnalyzer for analyzing the stages for a Spark application for execution memory spill, long tasks, task skew, and failures.
  • Call REST API for getting failed tasks.
  • Modify call to stages REST API to get task and executor summaries.

edwinalu avatar Sep 21 '18 03:09 edwinalu

Some of the thresholds are meant to be set with heuristic configuration parameters. The stage analysis can be used for multiple heuristics (long task, task skew, execution memory spill, configuration parameter recommendations). Does it make sense to set these thresholds for each heuristic (and call StageAnalysis each time), or would it be better to consolidate? With the independent configuration parameters, users can decide which ones to use/include. However, keeping the values in sync across multiple heuristics seems awkward.

Perhaps this could be multi-level, with a general Spark (or Pig) configuration parameter list, which would kick in if there isn't a heuristic-level setting. This could still be confusing if misconfigured though.

edwinalu avatar Oct 16 '18 00:10 edwinalu

image

Dr-elephant Compare does not display information How to configure @edwinalu @chriseppstein

fusonghe avatar May 14 '19 02:05 fusonghe