chronos
chronos copied to clipboard
Shutting down executor as the framework does not exist
I'm trying to schedule a spark job on mesos using Chronos. My setup is
Mesos 0.28.1 Chronos 2.5.0 Spark 1.6
My problem is that the chronos runs the schedule but the resulting spark job doesn't run. The log in stderr is as follows:
... I0222 16:27:31.999836 1691 exec.cpp:136] Version: 0.26.0
I0222 16:27:32.003273 1829 exec.cpp:383] Executor asked to shutdown
from mesos.slave.INFO log, the root cause of above problem seems to be this:
W0222 16:27:54.717996 924 slave.cpp:2431] Shutting down executor 'ct:1456129598000:0:test:' as the framework c1de6cda-b436-4f7f-bf7d-3f6b3d691e3d-0096 does not exist
where c1de6cda-b436-4f7f-bf7d-3f6b3d691e3d-0096 is the UUID of Chronos on my Mesos cluster. Unfortunately, there is no further hint on what could cause this problem.
The job is as follows:
{
"schedule": "R1/2016-02-19T18:43:42.0+08:00/PT2S",
"name": "spark test",
"epsilon": "PT15M",
"command": "/opt/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master mesos://d-mesosspark-1:7077 --deploy-mode cluster --supervise --executor-memory 2G --driver-memory 1G --total-executor-cores 1 hdfs:///tmp/spark-examples-1.6.0-hadoop2.6.0.jar 1000",
"owner": "[email protected]",
"async": false
}
I can verify that the command above is working when I'm executing it on a mesos-slave using /bin/sh -c.
How can I track down this problem? Does someone have a working example how to trigger spark-jobs correctly?
+1 Seeing the same problem with spark 2.0.2. @liuq4360 Did you ever find a workaround?
+1
+1
+1
I am seeing the same issue with spark 2.2 on mesos 1.14. Any help here. anyone
This is actually not a Chronos issue, the root cause of the problem is when you start spark-submit
using any Mesos task, spark-submit
copies all Mesos environment variables from its sandbox and sends them to the dispatcher when launching the Spark driver for the new task.
When the Spark drivers Mesos executor starts, its Mesos environment variables are overwritten by the spark-submit
's environment variables, so the executor tries to register with the Mesos agent using the wrong information.
This bug hasn't been fixed in Spark yet, but there's a workaround you can use in the meantime. You can use the unset
command before launching spark-submit
to unset any MESOS_*
environment variables so they are not carried over into other Spark tasks. This should allow you to run spark-submit
on Chronos/Mesos.