chronos icon indicating copy to clipboard operation
chronos copied to clipboard

When Chronos is run using Marathon, jobs submitted to Chronos do not run

Open mindscratch opened this issue 9 years ago • 15 comments

I have mesos 0.21.1, marathon 0.7.6 and chronos 2.3.0. I've deployed chronos using Marathon.

I am able to create a chronos job, where the command is simple like "echo hello", however, the job never runs. The "success" and "error" counts for the job are always 0. If I tail the chronos logs, I don't see any errors. Also, the Mesos UI shows no tasks for the jobs I submit to Chronos.

I've also tried other commands such as "echo hello > /tmp/hello.txt" and even "curl http://myserver" (so I could watch the server log to see if the job runs).

If I run Chronos outside of Marathon it works just fine.

Details:

OS: CentOS 6 Chronos installed with the chronos-2.3.0 rpm. Command used to start chronos: /usr/local/bin/chronos Chronos configured via /etc/chronos/conf __ /etc/chronos/conf/http_port = 8081 __ /etc/chronos/conf/master = zk://myserver:2181/mesos __ /etc/chronos/conf/zk_hosts = zk://myserver:2181/mesos Mesos version: 0.21.1 java -version: 1.7.0_55

mindscratch avatar Jan 15 '15 18:01 mindscratch

Here's what I see in the logs when I submit a simple job named "say hello" that is scheduled to run every minute using the command: "/usr/bin/echo hello":

WARN adding vertex: say hello
WARN Current number of vertices:1
Persisting job: say hellomindscratch>
Persisting job 'say hello' with data
State J_say hello does not exist yet. Adding to state
State update successful: true
Adding schedule for time:5:41:25 PM UTC
Checking schedules with tmie horizon:PT60S
Calling nextmindscratch> for stream: R/2015-01-16T17:41:03Z/PT1M, jobname: say hello
Task ready for scheduling: 2015-01-16T17:41:03.000Z
Scheduling:say hello
Scheduling task 'ct:1421430063000:0:say hello does not exist yet. Adding to state5] <mindscratch> Checking schedules with tmie horizon:PT60S
Calling nextmindscratch> for stream: R/2015-01-16T17:41:03Z/PT1M, jobname: say hello
Task ready for scheduling: 2015-01-16T17:41:03.000Z
Scheduling:say hello
Scheduling task 'ct:1421430063000:0:say hello does not exist yet. Adding to state
State update successful true
Saving updated job:ScheduleBasedJob(....)
Triggering: 'say hello'
removing task mapping 
State update successful true
Saving updated job:ScheduleBasedJob(....)
Triggering: 'say hello'
removing task mapping 

mindscratch avatar Jan 16 '15 17:01 mindscratch

I disabled iptables on all hosts and now it works...looks like a network configuration issue. Chronos (2.3.0) on Marathon (0.7.6) is working just fine.

mindscratch avatar Jan 16 '15 18:01 mindscratch

Thanks @mindscratch for debugging this! The only thing we need to close this issue is some documentation in the readme, The LIBPROCESS_IP environment variable should be set to a PORT that the Mesos Master can communicate with.

elingg avatar Jan 16 '15 23:01 elingg

@elingg slight correction, that should be

The LIBPROCESS_PORT environment variable should be set to a PORT...

instead of LIBPROCESS_IP

To start chronos I created the Marathon application using the following (only command shown for brevity) :

{
   "cmd": "LIBPROCESS_PORT=9000 /usr/local/bin/chronos --master zk://localhost:2181/mesos --zk_hosts zk://localhost:2181/mesos --http_port $PORT
}

mindscratch avatar Jan 17 '15 00:01 mindscratch

correct, thx!

elingg avatar Jan 17 '15 00:01 elingg

@elingg, @mindscratch note that this will only work if the IP is visible from mesos master. If, for example, you're running from within a docker container, you'd have to use host network --net=host and set LIBPROCES_IP to the public IP.

See https://issues.apache.org/jira/browse/MESOS-2587 for details

clehene avatar Apr 02 '15 00:04 clehene

@mindscratch Can you tell me your json file content? Beacuse I don't know how to make Chronos run using Marathon. Thanks.

wangqunOne avatar Apr 25 '16 06:04 wangqunOne

Are there a marathon reference config file?

robsonpeixoto avatar May 19 '16 22:05 robsonpeixoto

@wangqunOne I'll post something on Monday.

mindscratch avatar May 20 '16 23:05 mindscratch

Where @mindscratch ?

robsonpeixoto avatar May 21 '16 01:05 robsonpeixoto

I'll share in this comment.

mindscratch avatar May 21 '16 11:05 mindscratch

Marathon configuration for running Chronos:

{
  "id": "chronos",
  "cmd": "LIBPROCESS_PORT=6500 ./chronos --http_port $PORT",
  "cpus": 1,
  "mem": 512,
  "instances": 1,
  "uris": ["http://myfileserver/chronos-2.4.0.tgz"],
  "ports": [4400],
  "requirePorts": true,
  "healthChecks": [
    {"protocol": "HTTP", "path": "/scheduler/jobs"}
  ]
}

mindscratch avatar May 23 '16 10:05 mindscratch

How about a marathon config for running chronos in a docker container?

payneio avatar Feb 11 '17 00:02 payneio

Something like this should work; with constraints

{
  "id": "chronos",
  "args": [
    "--mesos_role=private",
    "--mesos_framework_name=chronos" ,
    "--hostname=<hostname>",
    "--master=zk://<ip>:2181,<ip>:2181,<ip>:2181/mesos",
    "--zk_hosts=zk://<ip>:2181,<ip>:2181,<ip>:2181",
    "--http_credentials=username:pass"
  ],
  "cpus": 0.5,
  "ports": [8080, 8081],
  "constraints": [["hostname", "LIKE", "<hostname>"]],
  "mem": 500.0,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "mesosphere/chronos:v3.0.0",
      "forcePullImage": true,
      "network": "HOST"
    }
  },
  "healthChecks": [
      {
        "path": "/",
        "port": 8080,
        "protocol": "HTTP",
        "gracePeriodSeconds": 300,
        "intervalSeconds": 60,
        "timeoutSeconds": 20,
        "maxConsecutiveFailures": 3,
        "ignoreHttp1xx": false
      }
  ],
  "env": {
    "PORT0": "8080",
    "PORT1": "8081"
  }
}

ianjuma avatar Apr 22 '17 01:04 ianjuma

I was able to run Chronos and schedule a job but it stays there. Noticed that chronos framework becomes inactive in mesos after couple of mins.

yogeshnath avatar Sep 15 '17 18:09 yogeshnath