chronos icon indicating copy to clipboard operation
chronos copied to clipboard

When Chronos is run using Marathon, jobs submitted to Chronos do not run

Open mindscratch opened this issue 10 years ago • 15 comments

I have mesos 0.21.1, marathon 0.7.6 and chronos 2.3.0. I've deployed chronos using Marathon.

I am able to create a chronos job, where the command is simple like "echo hello", however, the job never runs. The "success" and "error" counts for the job are always 0. If I tail the chronos logs, I don't see any errors. Also, the Mesos UI shows no tasks for the jobs I submit to Chronos.

I've also tried other commands such as "echo hello > /tmp/hello.txt" and even "curl http://myserver" (so I could watch the server log to see if the job runs).

If I run Chronos outside of Marathon it works just fine.

Details:

OS: CentOS 6 Chronos installed with the chronos-2.3.0 rpm. Command used to start chronos: /usr/local/bin/chronos Chronos configured via /etc/chronos/conf __ /etc/chronos/conf/http_port = 8081 __ /etc/chronos/conf/master = zk://myserver:2181/mesos __ /etc/chronos/conf/zk_hosts = zk://myserver:2181/mesos Mesos version: 0.21.1 java -version: 1.7.0_55

mindscratch avatar Jan 15 '15 18:01 mindscratch

Here's what I see in the logs when I submit a simple job named "say hello" that is scheduled to run every minute using the command: "/usr/bin/echo hello":

WARN adding vertex: say hello
WARN Current number of vertices:1
Persisting job: say hellomindscratch>
Persisting job 'say hello' with data
State J_say hello does not exist yet. Adding to state
State update successful: true
Adding schedule for time:5:41:25 PM UTC
Checking schedules with tmie horizon:PT60S
Calling nextmindscratch> for stream: R/2015-01-16T17:41:03Z/PT1M, jobname: say hello
Task ready for scheduling: 2015-01-16T17:41:03.000Z
Scheduling:say hello
Scheduling task 'ct:1421430063000:0:say hello does not exist yet. Adding to state5] <mindscratch> Checking schedules with tmie horizon:PT60S
Calling nextmindscratch> for stream: R/2015-01-16T17:41:03Z/PT1M, jobname: say hello
Task ready for scheduling: 2015-01-16T17:41:03.000Z
Scheduling:say hello
Scheduling task 'ct:1421430063000:0:say hello does not exist yet. Adding to state
State update successful true
Saving updated job:ScheduleBasedJob(....)
Triggering: 'say hello'
removing task mapping 
State update successful true
Saving updated job:ScheduleBasedJob(....)
Triggering: 'say hello'
removing task mapping 

mindscratch avatar Jan 16 '15 17:01 mindscratch

I disabled iptables on all hosts and now it works...looks like a network configuration issue. Chronos (2.3.0) on Marathon (0.7.6) is working just fine.

mindscratch avatar Jan 16 '15 18:01 mindscratch

Thanks @mindscratch for debugging this! The only thing we need to close this issue is some documentation in the readme, The LIBPROCESS_IP environment variable should be set to a PORT that the Mesos Master can communicate with.

elingg avatar Jan 16 '15 23:01 elingg

@elingg slight correction, that should be

The LIBPROCESS_PORT environment variable should be set to a PORT...

instead of LIBPROCESS_IP

To start chronos I created the Marathon application using the following (only command shown for brevity) :

{
   "cmd": "LIBPROCESS_PORT=9000 /usr/local/bin/chronos --master zk://localhost:2181/mesos --zk_hosts zk://localhost:2181/mesos --http_port $PORT
}

mindscratch avatar Jan 17 '15 00:01 mindscratch

correct, thx!

elingg avatar Jan 17 '15 00:01 elingg

@elingg, @mindscratch note that this will only work if the IP is visible from mesos master. If, for example, you're running from within a docker container, you'd have to use host network --net=host and set LIBPROCES_IP to the public IP.

See https://issues.apache.org/jira/browse/MESOS-2587 for details

clehene avatar Apr 02 '15 00:04 clehene

@mindscratch Can you tell me your json file content? Beacuse I don't know how to make Chronos run using Marathon. Thanks.

wangqunOne avatar Apr 25 '16 06:04 wangqunOne

Are there a marathon reference config file?

robsonpeixoto avatar May 19 '16 22:05 robsonpeixoto

@wangqunOne I'll post something on Monday.

mindscratch avatar May 20 '16 23:05 mindscratch

Where @mindscratch ?

robsonpeixoto avatar May 21 '16 01:05 robsonpeixoto

I'll share in this comment.

mindscratch avatar May 21 '16 11:05 mindscratch

Marathon configuration for running Chronos:

{
  "id": "chronos",
  "cmd": "LIBPROCESS_PORT=6500 ./chronos --http_port $PORT",
  "cpus": 1,
  "mem": 512,
  "instances": 1,
  "uris": ["http://myfileserver/chronos-2.4.0.tgz"],
  "ports": [4400],
  "requirePorts": true,
  "healthChecks": [
    {"protocol": "HTTP", "path": "/scheduler/jobs"}
  ]
}

mindscratch avatar May 23 '16 10:05 mindscratch

How about a marathon config for running chronos in a docker container?

payneio avatar Feb 11 '17 00:02 payneio

Something like this should work; with constraints

{
  "id": "chronos",
  "args": [
    "--mesos_role=private",
    "--mesos_framework_name=chronos" ,
    "--hostname=<hostname>",
    "--master=zk://<ip>:2181,<ip>:2181,<ip>:2181/mesos",
    "--zk_hosts=zk://<ip>:2181,<ip>:2181,<ip>:2181",
    "--http_credentials=username:pass"
  ],
  "cpus": 0.5,
  "ports": [8080, 8081],
  "constraints": [["hostname", "LIKE", "<hostname>"]],
  "mem": 500.0,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "mesosphere/chronos:v3.0.0",
      "forcePullImage": true,
      "network": "HOST"
    }
  },
  "healthChecks": [
      {
        "path": "/",
        "port": 8080,
        "protocol": "HTTP",
        "gracePeriodSeconds": 300,
        "intervalSeconds": 60,
        "timeoutSeconds": 20,
        "maxConsecutiveFailures": 3,
        "ignoreHttp1xx": false
      }
  ],
  "env": {
    "PORT0": "8080",
    "PORT1": "8081"
  }
}

ianjuma avatar Apr 22 '17 01:04 ianjuma

I was able to run Chronos and schedule a job but it stays there. Noticed that chronos framework becomes inactive in mesos after couple of mins.

yogeshnath avatar Sep 15 '17 18:09 yogeshnath