User did not initialize spark context!
Used Spark version Spark Version: 2.4.0 Used Spark Job Server version SJS version: v0.11.0 Deployed mode Yarn cluster mode Actual (wrong) behavior Spark context creation fails with the following error: [2021-06-09 17:17:56,338] INFO loy.yarn.ApplicationMaster [] - Final app status: FAILED, exitCode: 13 [2021-06-09 17:17:56,348] ERROR loy.yarn.ApplicationMaster [] - Uncaught exception: java.lang.IllegalStateException: User did not initialize spark context! at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:465) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:276) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:821) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:820) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:820) Steps to reproduce
- Start SJS using server_start.sh
- Try to start Spark context using the following command: http://XXX:8090/contexts/spark_context?context-factory=spark.jobserver.context.SessionContextFactory& Logs Conf file: env.conf.txt
Spark job server log spark-job-server.log.txt
Yarn context application log: spark-job-server.out.txt
It was working fine on SJS 0.9
Do you write 'master("local[]")' or 'setMaster("local[]")' in your code?
I have the same problem. Have you solved it?
I was not able to resolve the issue. I am waiting for someone to confirm that Yarn cluster mode works with SJS 0.10 or above.
I was not able to resolve the issue. I am waiting for someone to confirm that Yarn cluster mode works with SJS 0.10 or above.
I tried SJS 0.11.1, I couldn't run the yarn cluster mode, but I could run the yarn client mode.
Do you write 'master("local[]")' or 'setMaster("local[]")' in your code?
No coding, just create a context with API.
e.g. curl -d "" "localhost:8090/contexts/ctx-example?context-factory=spark.jobserver.context.SessionContextFactory&spark.executor.instances=2&spark.executor.cores=1&spark.executor.memory=1g&spark.driver.memory=1g"
@vglagoleva hello,has the official fully tested yarn cluster mode on spark3.x(SJS 0.11.1 ), if not ,what's the plan?
@jimolonely SJS 0.11.1 doesn't not support Spark 3 at all. There is an open pull request, which was not yet reviewed by anyone.
@jimolonely SJS 0.11.1 doesn't not support Spark 3 at all. There is an open pull request, which was not yet reviewed by anyone.
spark2.4.2 + SJS0.11.1 + Yarn cluster mode, same error, have spark 2.x tested?
Spark 2.4.2 is supported by SJS 0.11.1.
Regarding YARN: we never had some specific tests for YARN, because Jobserver has no special logic for it. In the end, Jobserver just uses spark-submit command.
Nevertheless, if you run Jobserver in cluster mode, please pay attention, that Jobserver binary is uploaded to some distributed database and that you don't use some default in-memory H2 backend.
It is very important that backend is setup correctly and your MANAGER_JAR_FILE variable is pointing to the path to the file in HDFS/PostgreSQL/.. and not to some local path.
Another thing is to check that you use correct Scala version. By default current Jobserver master branch compiles for Scala 2.12. You may need to use export SCALA_VERSION=2.11.8. Mismatch of Scala versions may also cause some unexpected errors.
I am not a YARN user myself so I can't help you more.
Hi @vglagoleva, Agreed, I am suspecting akka-actor upgrade(was using the spray server in SJS 0.9.0). What I have noticed that akka actors are not able to communicate between the cluster node and the job server node(akka master).
Thanks @vglagoleva , Issue has been resolved and now we are able to initialize the context. Thanks for the quick response 🙂
Thanks @vglagoleva , Issue has been resolved and now we are able to initialize the context. Thanks for the quick response 🙂
How to solve it?
I found several warnings. Maybe this is the reason?
21/07/02 16:08:57 INFO JobManagerActor: Starting actor spark.jobserver.JobManagerActor 21/07/02 16:08:57 INFO ProductionReaper: Starting actor spark.jobserver.common.akka.actor.ProductionReaper 21/07/02 16:08:57 WARN JobDAOActor: Shutting down spark.jobserver.io.JobDAOActor 21/07/02 16:08:57 WARN ProductionReaper: Shutting down spark.jobserver.common.akka.actor.ProductionReaper
hi @venkatkrishna110, Can you please share the env, conf and spark properties you are using for yarn cluster mode?
Hi @vglagoleva @pgouda89 I probably positioned it to be caused by ProductionReaper and JobDAOActor being Shutting down, but it is temporarily unclear why they were Shutting down.
Hi @hujian0923 , As @pgouda89 mentioned, if you share your configuration files, maybe we can find some clues there. It's hard to say something otherwise.
Hi @hujian0923 , As @pgouda89 mentioned, if you share your configuration files, maybe we can find some clues there. It's hard to say something otherwise.
Configuration information:
spark {
master = "yarn"
submit.deployMode = "cluster"
job-number-cpus = 4
jobserver {
port = 8090
context-per-jvm = true
context-creation-timeout = 1000000 s
yarn-context-creation-timeout = 1000000 s
default-sync-timeout = 1000000 s
short-timeout = 60 s
max-jobs-per-context = 80
jobdao = spark.jobserver.io.JobSqlDAO
filedao {
rootdir = /tmp/spark-jobserver/filedao/data
}
datadao {
rootdir = /tmp/spark-jobserver/upload
}
sqldao {
slick-driver = slick.jdbc.MySQLProfile
jdbc-driver = com.mysql.jdbc.Driver
rootdir = /tmp/spark-jobserver/sqldao/data
jdbc {
url = "jdbc:mysql://hadoop01:8100/jobserver?serverTimezone=Asia/Shanghai"
user = "jobserver"
password = "jobserver"
}
dbcp {
enabled = false
maxactive = 20
maxidle = 10
initialsize = 10
}
}
result-chunk-size = 1m
}
context-settings {
context-factory = "spark.jobserver.context.SessionContextFactory"
num-cpu-cores = 1
memory-per-node = 1G
forked-jvm-init-timeout = 300 s
context-init-timeout = 1000000 s
passthrough {
#es.nodes = "192.1.1.1"
}
}
}
akka.http.server {
idle-timeout = 1200 s
request-timeout = 1000 s
parsing.max-content-length = 300m
}
flyway.locations="db/mysql/migration"
akka {
remote.netty.tcp {
maximum-frame-size = 5120 MiB
hostname = "hadoop01"
}
}
I have the same problem.
Thanks @vglagoleva , Issue has been resolved and now we are able to initialize the context. Thanks for the quick response 🙂
Could you share some ideas for solution? thanks @venkatkrishna110