chronos icon indicating copy to clipboard operation
chronos copied to clipboard

Shutting down executor as the framework does not exist

Open liuq4360 opened this issue 8 years ago • 6 comments

I'm trying to schedule a spark job on mesos using Chronos. My setup is

Mesos 0.28.1 Chronos 2.5.0 Spark 1.6

My problem is that the chronos runs the schedule but the resulting spark job doesn't run. The log in stderr is as follows:

... I0222 16:27:31.999836 1691 exec.cpp:136] Version: 0.26.0

I0222 16:27:32.003273 1829 exec.cpp:383] Executor asked to shutdown

from mesos.slave.INFO log, the root cause of above problem seems to be this:

W0222 16:27:54.717996 924 slave.cpp:2431] Shutting down executor 'ct:1456129598000:0:test:' as the framework c1de6cda-b436-4f7f-bf7d-3f6b3d691e3d-0096 does not exist

where c1de6cda-b436-4f7f-bf7d-3f6b3d691e3d-0096 is the UUID of Chronos on my Mesos cluster. Unfortunately, there is no further hint on what could cause this problem.

The job is as follows:

{

"schedule": "R1/2016-02-19T18:43:42.0+08:00/PT2S",

"name": "spark test",

"epsilon": "PT15M",

"command": "/opt/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master mesos://d-mesosspark-1:7077 --deploy-mode cluster --supervise --executor-memory 2G --driver-memory 1G --total-executor-cores 1 hdfs:///tmp/spark-examples-1.6.0-hadoop2.6.0.jar 1000",

"owner": "[email protected]",

"async": false

}

I can verify that the command above is working when I'm executing it on a mesos-slave using /bin/sh -c.

How can I track down this problem? Does someone have a working example how to trigger spark-jobs correctly?

liuq4360 avatar Jul 23 '16 05:07 liuq4360

+1 Seeing the same problem with spark 2.0.2. @liuq4360 Did you ever find a workaround?

marktb1 avatar Dec 19 '16 16:12 marktb1

+1

calamari- avatar Feb 06 '17 22:02 calamari-

+1

yeoshim avatar Apr 17 '17 04:04 yeoshim

+1

srikanth-viswanathan avatar Aug 16 '17 16:08 srikanth-viswanathan

I am seeing the same issue with spark 2.2 on mesos 1.14. Any help here. anyone

bot-netizen avatar Nov 20 '17 19:11 bot-netizen

This is actually not a Chronos issue, the root cause of the problem is when you start spark-submit using any Mesos task, spark-submit copies all Mesos environment variables from its sandbox and sends them to the dispatcher when launching the Spark driver for the new task.

When the Spark drivers Mesos executor starts, its Mesos environment variables are overwritten by the spark-submit's environment variables, so the executor tries to register with the Mesos agent using the wrong information.

This bug hasn't been fixed in Spark yet, but there's a workaround you can use in the meantime. You can use the unset command before launching spark-submit to unset any MESOS_* environment variables so they are not carried over into other Spark tasks. This should allow you to run spark-submit on Chronos/Mesos.

gkleiman avatar Nov 22 '17 10:11 gkleiman