Spark 2.3 is now out... Will Dr.elephant support it out of the box?
https://spark.apache.org/releases/spark-release-2-3-0.html
So ive been reading about how linkedin has been using a modified SHS to pull from 1.X 2.X...
But if you were to install Spark 2.3, will DR.elephant just work with it out of the box?
Has anyone tried Spark 2.3 and DR elephant yet?
I think Spark 2.3 is a requirement to analyze Spark 2.x jobs, as per https://github.com/linkedin/dr-elephant/issues/327 .. or at least you would need to build a custom SHS as it was a prerequisite - see https://issues.apache.org/jira/browse/SPARK-18085 for details.
SPARK-18085 brings the LevelDB storage for Spark History Server(SHS), this would help Dr. Elephant to gather metrics as it improves SHS overall performance. If you have some former versions of SHS which kept all data in-memory, Dr. Elephant can still gather metrics from the former version of SHS, as the Rest Fetcher in Dr. Elephant calls the same RestAPIs. If you don't have a large amount of applications per day, former version SHS might still work for you.
But since we added new metrics in Spark(code change in Executor, Driver, SHS as described in ticket SPARK-23206), you might not be able to get some new metrics that we added. The reason that you need a custom SHS is our PR for these new metrics is not yet getting merged. We are targeting Spark 2.3.1 and Spark 2.4. If you have your own SHS without all these, Dr. Elephant will not be able to gather some new metrics.
Internally we are using Spark 2.3 Spark History Server with the above PR applied. There are some other patches for some SHS issues, but those will not be blockers for using Dr. Elephant.
Thank you @zhouyejoe for this information - really helpful as we will be checking out Dr. Elephant soon too.
@zhouyejoe does your team/Linkedin have any plans to open source the Spark 2.3/4 patches you mentioned above?
@zhouyejoe does your team/Linkedin have any plans to open source the Spark 2.3/4 patches you mentioned above?
Those PRs has been merged into Spark trunk, please take a look at the comments in https://issues.apache.org/jira/browse/SPARK-23206 for more details.