dr-elephant
dr-elephant copied to clipboard
spark 2.X could not be analyzed?
i build project compile.conf hadoop_version=2.6.0 spark_version=1.6.0
i run my spark app in spark 2.X
now there is no error messages in dr_elephant.log
but in my dr-elephant web
so. i need compile the project with hadoop2.X spark 2.X ? but it will encounter another build problems.what should i do ?
I change my compile.conf and specify spark_version=2.1.1 , but it can't build successfully . Does Dr.Elephant not support Spark 2.X ?
Dr Elephant does not support Spark 2.X
@shkhrgpt do you mean the fetcher is not able to parse it or Dr. E itself would not build with 2.X?
Dr Elephant would not build with Spark 2.X after SparkFSFetcher is added because of Spark listener classes. The workaround would to build with Spark 1.X and use it against Spark 2.X but that would only work as long Spark REST fetcher is used.
I see, thanks for clarifying this.
@akshayrai why not we get another branch to support dr.elephant with spark 2.X?
@BruceXu1991 I don't think we need a separate branch for Spark 2.X. Even in the same branch, we should add Spark 2.X support without breaking Spark 1.X.
agree with you, we should add Spark 2.X support instead of have a separate branch for Spark 2.X
Does anyone work on this?
The guys at pepperdata said they would contribute this back, but no sign of it yet.
I am here to modify the support spark2.0
I tried Spark REST fetcher with Spark 2.2, getting the following exception and the metrics have zero value:
[error] o.a.s.s.ReplayListenerBus - Exception parsing Spark event log: application_1510469066221_0020 org.json4s.package$MappingException: Did not find value which can be converted into boolean at org.json4s.reflect.package$.fail(package.scala:96) ~[org.json4s.json4s-core_2.10-3.2.10.jar:3.2.10] at org.json4s.Extraction$.convert(Extraction.scala:554) ~[org.json4s.json4s-core_2.10-3.2.10.jar:3.2.10] at org.json4s.Extraction$.extract(Extraction.scala:331) ~[org.json4s.json4s-core_2.10-3.2.10.jar:3.2.10] at org.json4s.Extraction$.extract(Extraction.scala:42) ~[org.json4s.json4s-core_2.10-3.2.10.jar:3.2.10] at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:21) ~[org.json4s.json4s-core_2.10-3.2.10.jar:3.2.10] at org.apache.spark.util.JsonProtocol$.storageLevelFromJson(JsonProtocol.scala:881) ~[org.apache.spark.spark-core_2.10-1.6.3.jar:1.6.3] [error] o.a.s.s.ReplayListenerBus - Malformed line #37: {"Event":"SparkListenerJobStart","Job ID":0,"Submission Time":1511486140513,"Stage Infos":[{"Stage ID":0,"Stage Attempt ID":0,"Stage Name":"collect at /home/hadoop/classify_image_pyspark_emr.py:185","Number of Tasks":512,"RDD Info":[{"RDD ID":1,"Name":"PythonRDD","Callsite":"collect at /home/hadoop/classify_image_pyspark_emr.py:185","Parent IDs":[0],"Storage Level":{"Use Disk":false,"Use Memory":false,"Deserialized":false,"Replication":1},"Number of Partitions":512,"Number of Cached Partitions":0,"Memory Size":0,"Disk Size":0},{"RDD ID":0,"Name":"ParallelCollectionRDD","Scope":"{"id":"0","name":"parallelize"}","Callsite":"parallelize at PythonRDD.scala:480","Parent IDs":[],"Storage Level":{"Use Disk":false,"Use Memory":false,"Deserialized":false,"Replication":1},"Number of Partitions":512,"Number of Cached Partitions":0,"Memory Size":0,"Disk Size":0}],"Parent IDs":[],"Details":"org.apache.spark.rdd.RDD.collect(RDD.scala:935)\norg.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:458)\norg.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)\nsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\nsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\nsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\njava.lang.reflect.Method.invoke(Method.java:498)\npy4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\npy4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\npy4j.Gateway.invoke(Gateway.java:280)\npy4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\npy4j.commands.CallCommand.execute(CallCommand.java:79)\npy4j.GatewayConnection.run(GatewayConnection.java:214)\njava.lang.Thread.run(Thread.java:748)","Accumulables":[]}],"Stage IDs":[0],"Properties":{"spark.rdd.scope.noOverride":"true","callSite.short":"collect at /home/hadoop/classify_image_pyspark_emr.py:185","spark.rdd.scope":"{"id":"1","name":"collect"}"}}
anybody here analyzing spark2.X job in drelephant . How spark2.X SparkListener blocking here for reading event and analysis.
Please refer #327 for the updates.
let me look your app-conf/FetcherConf.xml file
I am also facing same issue. Compiled Dr Elephant with Spark 1.X and analyzing spark 2.X jobs.
I had solved this problem and I can push git
At 2018-11-17 23:32:00, "ankurchourasiya" [email protected] wrote:
I am also fetching same issue. Compiled Dr Elephant with Spark 1.X and analyzing spark 2.X jobs.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
I solved this problem
my address https://github.com/Hanqingkuo/dr.elephant-spark2.x
It will be nice if u pull a request here. @Hanqingkuo
@Hanqingkuo Thanks for sharing code. I could compile dr elephant and now able to see jobs in UI. Though Malformed record error is still there in logs also Heuristics seems to be not captures correctly. Running on Spark 2.3.
Have you also faced such issue: (Screen capture)
可能是没有收集算法吧 需要你自己写
At 2018-12-03 18:42:08, "ankurchourasiya" [email protected] wrote:
@Hanqingkuo Thanks for sharing code. I could compile dr elephant and now able to see jobs in UI. Though Malformed record error is still there in logs also Heuristics seems to be not captures correctly. Running on Spark 2.3. Have you also faced such issue: (Screen capture)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.