dr-elephant icon indicating copy to clipboard operation
dr-elephant copied to clipboard

Is there support for spark Spark 2.4.2

Open kiranch2004 opened this issue 5 years ago • 2 comments

I get the following error message when run against spark 2.4.2

11-18-2020 04:58:26 INFO [Thread-10] com.linkedin.drelephant.ElephantRunner : Job queue size is 0 11-18-2020 04:58:26 INFO [dr-el-executor-thread-2] com.linkedin.drelephant.ElephantRunner : Analyzing SPARK application_1605673758178_0001 11-18-2020 04:58:26 ERROR [dr-el-executor-thread-2] com.linkedin.drelephant.ElephantRunner : Failed to analyze SPARK application_1605673758178_0001 java.lang.IllegalArgumentException: java.net.UnknownHostException: null at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418) at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:435) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.initialize(WebHdfsFileSystem.java:239) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2859) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2896) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2878) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:392) at com.linkedin.drelephant.util.SparkUtils$class.fileSystemAndPathForEventLogDir(SparkUtils.scala:70) at com.linkedin.drelephant.util.SparkUtils$.fileSystemAndPathForEventLogDir(SparkUtils.scala:312) at org.apache.spark.deploy.history.SparkFSFetcher.doFetchData(SparkFSFetcher.scala:84) at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$fetchData$1.apply(SparkFSFetcher.scala:74) at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$fetchData$1.apply(SparkFSFetcher.scala:74) at org.apache.spark.deploy.history.SparkFSFetcher$$anon$1.run(SparkFSFetcher.scala:78) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1824) at com.linkedin.drelephant.security.HadoopSecurity.doAs(HadoopSecurity.java:109) at org.apache.spark.deploy.history.SparkFSFetcher.doAsPrivilegedAction(SparkFSFetcher.scala:78) at org.apache.spark.deploy.history.SparkFSFetcher.fetchData(SparkFSFetcher.scala:74) at com.linkedin.drelephant.spark.fetchers.FSFetcher.fetchData(FSFetcher.scala:34) at com.linkedin.drelephant.spark.fetchers.FSFetcher.fetchData(FSFetcher.scala:29) at com.linkedin.drelephant.analysis.AnalyticJob.getAnalysis(AnalyticJob.java:308) at com.linkedin.drelephant.ElephantRunner$ExecutorJob.run(ElephantRunner.java:390) at com.linkedin.drelephant.priorityexecutor.RunnableWithPriority$1.run(RunnableWithPriority.java:36) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.UnknownHostException: null

kiranch2004 avatar Nov 18 '20 05:11 kiranch2004

I have the same question/problem. A little more background on my use case:

I set Dr. Elephant using this tutorial: https://aws.amazon.com/blogs/big-data/tune-hadoop-and-spark-performance-with-dr-elephant-and-sparklens-on-amazon-emr/

The spark history server is configured using a persistent UI: https://docs.aws.amazon.com/emr/latest/ManagementGuide/app-history-spark-UI.html

So, Spark History Server is configured, but not on the same cluster as the job I am trying to analyze, an external endpoint (which is reachable). I tried doing suggested updates like this (hoping to bypass the connection url issue):

   <fetcher>
     <applicationtype>spark</applicationtype>
     <classname>com.linkedin.drelephant.spark.fetchers.SparkFetcher</classname>
     <params>
       <event_log_location_uri>webhdfs:///var/log/spark/apps</event_log_location_uri>
       <use_rest_for_eventlogs>true</use_rest_for_eventlogs>
       <should_process_logs_locally>true</should_process_logs_locally>
     </params>
   </fetcher>

I also tried to set the spark.yarn.historyServer.address to one of these

	spark.yarn.historyServer.address persistentEndpoint.emrappui-prod.us-east-1.amazonaws.com
	spark.yarn.historyServer.address persistentEndpoint.emrappui-prod.us-east-1.amazonaws.com/shs
	spark.yarn.historyServer.address persistentEndpoint.emrappui-prod.us-east-1.amazonaws.com/shs/
	spark.yarn.historyServer.address persistentEndpoint.emrappui-prod.us-east-1.amazonaws.com:443/shs
	spark.yarn.historyServer.address persistentEndpoint.emrappui-prod.us-east-1.amazonaws.com:443/shs/

However, I get either:

  • Caused by: java.net.UnknownHostException: null, because it's not the rest api endpoint
  • "Requirement Failed", I think due to the fact that forward slash + /shs is in the path.

@ShubhamGupta29, have you seen this, or do you have any guidance?

tchangalov avatar Mar 25 '21 19:03 tchangalov

Also I think that https protocol is not working, since now I am getting a redirect error.

tchangalov avatar Apr 01 '21 21:04 tchangalov