hdfs-deprecated stuck in NameNode state, NameNode started and immediately lost

stuck in NameNode state, NameNode started and immediately lost

Open n0ndual opened this issue 10 years ago • 8 comments

I used vagrant to start a cluster which has 1 master and only 4 slaves. I pulled your vagrant project from Banno/vagrant-mesos.

But it failed to start, stuck in the NAMENODE state. The log and mesos ui shows that the NameNode started and immediately lost. And mesos tried to restart it, but it still stoped immediately after start. the stdout is

11:10:36.043 [Thread-57] INFO  o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 1/2
11:10:36.045 [Thread-57] INFO  o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
11:10:36.046 [Thread-57] INFO  o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
11:10:36.504 [Thread-58] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
11:10:36.504 [Thread-58] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "20151103-102825-169978048-5050-873-O2462"

11:10:36.505 [Thread-58] INFO  o.a.m.hdfs.scheduler.NodeLauncher - Node: namenode, evaluating offer: value: "20151103-102825-169978048-5050-873-O2462"

11:10:36.536 [Thread-58] INFO  o.a.mesos.hdfs.scheduler.NameNode - Already running namenode on mesos-slave1
11:10:36.538 [Thread-58] INFO  o.a.m.hdfs.scheduler.NodeLauncher - Node: namenode, declining offer: value: "20151103-102825-169978048-5050-873-O2462"

11:10:38.724 [Thread-59] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.namenode.namenode.NameNodeExecutor.1446548986442 state=TASK_FAILED message=''
11:10:38.737 [Thread-59] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
  value: "task.namenode.namenode.NameNodeExecutor.1446548986442"
}
state: TASK_FAILED
slave_id {
  value: "20151103-102825-169978048-5050-873-S1"
}
timestamp: 1.446549038686555E9
source: SOURCE_EXECUTOR
uuid: "\026\371*\016\351sH4\252\226\200\346\3054\002\030"

11:10:38.749 [Thread-59] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.namenode.namenode.NameNodeExecutor.1446548986442
11:10:38.761 [Thread-59] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1446548917315, slaveId=20151103-102825-169978048-5050-873-S1
11:10:38.762 [Thread-59] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1446548986442, slaveId=20151103-102825-169978048-5050-873-S1
11:10:38.780 [Thread-59] INFO  o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 0/2
11:10:38.781 [Thread-59] INFO  o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
11:10:38.782 [Thread-59] INFO  o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
11:10:38.812 [Thread-60] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.namenode.namenode.NameNodeExecutor.1446548986442 state=TASK_RUNNING message=''
11:10:38.813 [Thread-60] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
  value: "task.namenode.namenode.NameNodeExecutor.1446548986442"
}
state: TASK_RUNNING
slave_id {
  value: "20151103-102825-169978048-5050-873-S1"
}
timestamp: 1.446549038783159E9
source: SOURCE_EXECUTOR
uuid: "\301\345,\272f>O\\\277Y5\027\260$A{"
labels {
  labels {
    key: "status"
    value: "initialized"
  }
}

11:10:38.823 [Thread-60] ERROR o.apache.mesos.hdfs.state.HdfsState - Failed to update TaskStatus with ID: task.namenode.namenode.NameNodeExecutor.1446548986442with exception: null
11:10:38.824 [Thread-60] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.namenode.namenode.NameNodeExecutor.1446548986442
11:10:38.838 [Thread-60] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1446548917315, slaveId=20151103-102825-169978048-5050-873-S1
11:10:38.839 [Thread-60] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1446548986442, slaveId=20151103-102825-169978048-5050-873-S1
11:10:38.860 [Thread-60] INFO  o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 0/2
11:10:38.860 [Thread-60] INFO  o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
11:10:38.860 [Thread-60] INFO  o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
11:10:40.518 [Thread-61] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
11:10:40.519 [Thread-61] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "20151103-102825-169978048-5050-873-O2466"

11:10:40.519 [Thread-61] INFO  o.a.m.hdfs.scheduler.NodeLauncher - Node: namenode, evaluating offer: value: "20151103-102825-169978048-5050-873-O2466"

11:10:40.543 [Thread-61] INFO  o.a.mesos.hdfs.scheduler.NameNode - We need to colocate the namenode with a journalnode and there is no journalnode running on this host. mesos-slave2

the mesos web ui:

the mesos-site.xml is:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>mesos.hdfs.data.dir</name>
    <description>The primary data directory in HDFS</description>
    <value>/var/lib/hdfs/data</value>
  </property>

  <property>
    <name>mesos.hdfs.secondary.data.dir</name>
    <description>The secondary data directory in HDFS</description>
    <value>/var/run/hadoop-hdfs</value>
  </property>

  <property>
    <name>mesos.hdfs.native-hadoop-binaries</name>
    <description>Mark true if you have hadoop pre-installed on your host machines (otherwise it will be distributed by the scheduler)</description>
    <value>false</value>
  </property>

  <property>
    <name>mesos.hdfs.framework.mnt.path</name>
    <description>Mount location (if mesos.hdfs.native-hadoop-binaries is marked false)</description>
    <value>/opt/mesosphere</value>
  </property>

  <property>
    <name>mesos.hdfs.state.zk</name>
    <description>Comma-separated hostname-port pairs of zookeeper node locations for HDFS framework state information</description>
    <value>mesos-master.vagrant:2181</value>
  </property>

  <property>
    <name>mesos.master.uri</name>
    <description>Zookeeper entry for mesos master location</description>
    <value>zk://mesos-master.vagrant:2181/mesos</value>
  </property>

  <property>
    <name>mesos.hdfs.zkfc.ha.zookeeper.quorum</name>
    <description>Comma-separated list of zookeeper hostname-port pairs for HDFS HA features</description>
    <value>mesos-master.vagrant:2181</value>
  </property>

  <property>
    <name>mesos.hdfs.framework.name</name>
    <description>Your Mesos framework name and cluster name when accessing files (hdfs://YOUR_NAME)</description>
    <value>hdfs</value>
  </property>

  <property>
    <name>mesos.hdfs.mesosdns</name>
    <description>Whether to use Mesos DNS for service discovery within HDFS</description>
    <value>false</value>
  </property>

  <property>
    <name>mesos.hdfs.mesosdns.domain</name>
    <description>Root domain name of Mesos DNS (usually 'mesos')</description>
    <value>mesos</value>
  </property>

  <property>
    <name>mesos.native.library</name>
    <description>Location of libmesos.so</description>
    <value>/usr/local/lib/libmesos.so</value>
  </property>

  <property>
    <name>mesos.hdfs.journalnode.count</name>
    <description>Number of journal nodes (must be odd)</description>
    <value>1</value>
  </property>

  <!-- Additional settings for fine-tuning -->

  <property>
    <name>mesos.hdfs.jvm.overhead</name>
    <description>Multiplier on resources reserved in order to account for JVM allocation</description>
    <value>1</value>
  </property>

  <property>
    <name>mesos.hdfs.hadoop.heap.size</name>
    <value>128</value>
  </property>

  <property>
    <name>mesos.hdfs.namenode.heap.size</name>
    <value>128</value>
  </property>

  <property>
    <name>mesos.hdfs.datanode.heap.size</name>
    <value>128</value>
  </property>

  <property>
    <name>mesos.hdfs.executor.heap.size</name>
    <value>128</value>
  </property>

  <property>
    <name>mesos.hdfs.executor.cpus</name>
    <value>0.2</value>
  </property>

  <property>
    <name>mesos.hdfs.namenode.cpus</name>
    <value>0.2</value>
  </property>

  <property>
    <name>mesos.hdfs.journalnode.cpus</name>
    <value>0.2</value>
  </property>

  <property>
    <name>mesos.hdfs.datanode.cpus</name>
    <value>0.2</value>
  </property>

  <property>
    <name>mesos.hdfs.user</name>
    <value>root</value>
  </property>

  <property>
    <name>mesos.hdfs.role</name>
    <value>*</value>
  </property>

  <property>    
    <name>mesos.hdfs.datanode.exclusive</name>
    WARNING-It is not advisable to run the datanode on same slave because of performance issues
    <description>Whether to run the datanode on slave different from namenode and journal nodes</description>
    <value>true</value>
  </property>

</configuration>

Someone please help me, I really need to make this run

Nov 03 '15 11:11 n0ndual

Hi, what do the logs in your task sandboxes say for the failed tasks? As far as using the vagrant file from Banno/vagrant-mesos, that is an old version that has not been tested in sometime. That was actually created by a partner org (Banno).

Nov 03 '15 20:11 elingg

@elingg thank you for your reply! I found that namenodes were started and running, but their status was uninitialized. So the scheduler killed them after a while and then tried to restart another. stdout showed:

23:03:07.478 [Thread-1035] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.namenode.namenode.NameNodeExecutor.1446591678871 state=TASK_RUNNING message=''
23:03:07.480 [Thread-1035] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
  value: "task.namenode.namenode.NameNodeExecutor.1446591678871"
}
state: TASK_RUNNING
slave_id {
  value: "20151103-202237-169978048-5050-1873-S0"
}
timestamp: 1.446591787466207E9
source: SOURCE_EXECUTOR
uuid: "S\351\372\261u\256B\214\261t;\234(8%6"
labels {
  labels {
    key: "status"
    value: "uninitialized"
  }
}

23:03:07.490 [Thread-1035] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.namenode.namenode.NameNodeExecutor.1446591678871
23:03:07.500 [Thread-1035] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1446589632519, slaveId=20151103-202237-169978048-5050-1873-S3
23:03:07.501 [Thread-1035] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1446589948240, slaveId=20151103-202237-169978048-5050-1873-S1
23:03:07.501 [Thread-1035] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.namenode.namenode.NameNodeExecutor.1446591678871, slaveId=20151103-202237-169978048-5050-1873-S0
23:03:07.502 [Thread-1035] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1446589947175, slaveId=20151103-202237-169978048-5050-1873-S0
23:03:07.522 [Thread-1035] INFO  o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 3/3 and name counts: 2/2
23:03:07.531 [Thread-1035] INFO  o.apache.mesos.hdfs.state.HdfsState - 0/2 NameNodes initialized.
23:03:07.531 [Thread-1035] INFO  o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
23:03:07.531 [Thread-1035] INFO  o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
23:03:07.532 [Thread-1036] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.zkfc.namenode.NameNodeExecutor.1446591678871 state=TASK_RUNNING message=''
23:03:07.533 [Thread-1036] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
  value: "task.zkfc.namenode.NameNodeExecutor.1446591678871"
}
state: TASK_RUNNING
slave_id {
  value: "20151103-202237-169978048-5050-1873-S0"
}
timestamp: 1.446591787468008E9
source: SOURCE_EXECUTOR
uuid: "\255\266\261\360\024qNc\254\000\335\003\016O]\353"

I can't see the sandbox log, i don't know why the master cannot access the slave. the web ui showed this:

Failed to connect to slave '20151103-202237-169978048-5050-1873-S3' on 'mesos-slave4:5051'.
Potential reasons:
The slave's hostname, 'mesos-slave4', is not accessible from your network
The slave's port, '5051', is not accessible from your network
The slave timed out or went offline

Then I sshed into slave and found the log of mesos-slave:

Launching task task.journalnode.journalnode.NodeExecutor.1446589948240 for framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:32:28.33340 I1103 22:32:28.333302  1664 slave.cpp:4208] Launching executor executor.journalnode.NodeExecutor.1446589948240 of framework 20151103-221029-169978048-5050-901-0000 in work directory '/tmp/slaves/20151103-202237-169978048-5050-1873-S1/frameworks/20151103-221029-169978048-5050-901-0000/executors/executor.journalnode.NodeExecutor.1446589948240/runs/36e3546f-c292-418e-aeed-92568eaf6893'
2015-11-03_22:32:28.33517 I1103 22:32:28.334951  1669 docker.cpp:772] No container info found, skipping launch
2015-11-03_22:32:28.33583 I1103 22:32:28.335490  1665 containerizer.cpp:484] Starting container '36e3546f-c292-418e-aeed-92568eaf6893' for executor 'executor.journalnode.NodeExecutor.1446589948240' of framework '20151103-221029-169978048-5050-901-0000'
2015-11-03_22:32:28.33905 I1103 22:32:28.338449  1663 launcher.cpp:130] Forked child with pid '1799' for container '36e3546f-c292-418e-aeed-92568eaf6893'
2015-11-03_22:32:28.33908 I1103 22:32:28.338742  1663 containerizer.cpp:694] Checkpointing executor's forked pid 1799 to '/tmp/meta/slaves/20151103-202237-169978048-5050-1873-S1/frameworks/20151103-221029-169978048-5050-901-0000/executors/executor.journalnode.NodeExecutor.1446589948240/runs/36e3546f-c292-418e-aeed-92568eaf6893/pids/forked.pid'
2015-11-03_22:32:28.34083 I1103 22:32:28.340762  1664 slave.cpp:1401] Queuing task 'task.journalnode.journalnode.NodeExecutor.1446589948240' for executor executor.journalnode.NodeExecutor.1446589948240 of framework '20151103-221029-169978048-5050-901-0000
2015-11-03_22:32:28.34812 I1103 22:32:28.348052  1663 fetcher.cpp:238] Fetching URIs using command '/usr/libexec/mesos/mesos-fetcher'
2015-11-03_22:32:32.43333 I1103 22:32:32.433059  1665 slave.cpp:3648] Current disk usage 4.46%. Max allowed age: 5.987967803794803days
2015-11-03_22:33:32.43675 I1103 22:33:32.436506  1664 slave.cpp:3648] Current disk usage 4.52%. Max allowed age: 5.983440275860776days
2015-11-03_22:34:22.00633 I1103 22:34:22.006160  1665 slave.cpp:3165] Monitoring executor 'executor.journalnode.NodeExecutor.1446589948240' of framework '20151103-221029-169978048-5050-901-0000' in container '36e3546f-c292-418e-aeed-92568eaf6893'
2015-11-03_22:34:22.76291 I1103 22:34:22.762608  1665 slave.cpp:2164] Got registration for executor 'executor.journalnode.NodeExecutor.1446589948240' of framework 20151103-221029-169978048-5050-901-0000 from executor(1)@192.168.33.12:44603
2015-11-03_22:34:22.76383 I1103 22:34:22.763734  1665 slave.cpp:1555] Sending queued task 'task.journalnode.journalnode.NodeExecutor.1446589948240' to executor 'executor.journalnode.NodeExecutor.1446589948240' of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:34:22.95782 I1103 22:34:22.957151  1668 slave.cpp:2531] Handling status update TASK_RUNNING (UUID: d3e51e7f-4a0a-4b0a-bdc2-7c05f0cde9a3) for task task.journalnode.journalnode.NodeExecutor.1446589948240 of framework 20151103-221029-169978048-5050-901-0000 from executor(1)@192.168.33.12:44603
2015-11-03_22:34:22.95784 I1103 22:34:22.957340  1668 status_update_manager.cpp:317] Received status update TASK_RUNNING (UUID: d3e51e7f-4a0a-4b0a-bdc2-7c05f0cde9a3) for task task.journalnode.journalnode.NodeExecutor.1446589948240 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:34:22.95786 I1103 22:34:22.957597  1668 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_RUNNING (UUID: d3e51e7f-4a0a-4b0a-bdc2-7c05f0cde9a3) for task task.journalnode.journalnode.NodeExecutor.1446589948240 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:34:22.96231 I1103 22:34:22.961951  1668 slave.cpp:2776] Forwarding the update TASK_RUNNING (UUID: d3e51e7f-4a0a-4b0a-bdc2-7c05f0cde9a3) for task task.journalnode.journalnode.NodeExecutor.1446589948240 of framework 20151103-221029-169978048-5050-901-0000 to [email protected]:5050
2015-11-03_22:34:22.96233 I1103 22:34:22.962057  1668 slave.cpp:2709] Sending acknowledgement for status update TASK_RUNNING (UUID: d3e51e7f-4a0a-4b0a-bdc2-7c05f0cde9a3) for task task.journalnode.journalnode.NodeExecutor.1446589948240 of framework 20151103-221029-169978048-5050-901-0000 to executor(1)@192.168.33.12:44603
2015-11-03_22:34:23.03139 I1103 22:34:23.031255  1667 status_update_manager.cpp:389] Received status update acknowledgement (UUID: d3e51e7f-4a0a-4b0a-bdc2-7c05f0cde9a3) for task task.journalnode.journalnode.NodeExecutor.1446589948240 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:34:23.03144 I1103 22:34:23.031334  1667 status_update_manager.hpp:346] Checkpointing ACK for status update TASK_RUNNING (UUID: d3e51e7f-4a0a-4b0a-bdc2-7c05f0cde9a3) for task task.journalnode.journalnode.NodeExecutor.1446589948240 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:34:32.43837 I1103 22:34:32.438122  1662 slave.cpp:3648] Current disk usage 4.92%. Max allowed age: 5.955365198736122days
2015-11-03_22:35:32.43954 I1103 22:35:32.439317  1666 slave.cpp:3648] Current disk usage 4.92%. Max allowed age: 5.955878043914722days
2015-11-03_22:36:32.45008 I1103 22:36:32.449903  1665 slave.cpp:3648] Current disk usage 4.92%. Max allowed age: 5.955854603622627days
2015-11-03_22:36:44.94282 2015-11-03 22:36:44,942:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 111ms
2015-11-03_22:36:48.33689 2015-11-03 22:36:48,336:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 59ms
2015-11-03_22:37:32.45269 I1103 22:37:32.452337  1668 slave.cpp:3648] Current disk usage 4.92%. Max allowed age: 5.955826901459236days
2015-11-03_22:37:35.19007 2015-11-03 22:37:35,189:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 137ms
2015-11-03_22:38:32.45511 I1103 22:38:32.454891  1664 slave.cpp:3648] Current disk usage 4.92%. Max allowed age: 5.955798488983970days
2015-11-03_22:38:45.32138 2015-11-03 22:38:45,321:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 51ms
2015-11-03_22:39:05.41440 2015-11-03 22:39:05,414:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 70ms
2015-11-03_22:39:32.45629 I1103 22:39:32.456070  1668 slave.cpp:3648] Current disk usage 4.92%. Max allowed age: 5.955775048691875days
2015-11-03_22:40:32.45675 I1103 22:40:32.456569  1664 slave.cpp:3648] Current disk usage 4.92%. Max allowed age: 5.955747346528484days
2015-11-03_22:40:38.88019 2015-11-03 22:40:38,879:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 11ms
2015-11-03_22:40:49.01236 2015-11-03 22:40:49,011:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 120ms
2015-11-03_22:41:32.45942 I1103 22:41:32.459332  1665 slave.cpp:3648] Current disk usage 4.92%. Max allowed age: 5.955724616548276days
2015-11-03_22:42:09.15752 2015-11-03 22:42:09,157:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 43ms
2015-11-03_22:42:16.49725 I1103 22:42:16.497176  1664 slave.cpp:1144] Got assigned task task.namenode.namenode.NameNodeExecutor.1446590536443 for framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:42:16.49761 I1103 22:42:16.497565  1664 slave.cpp:1144] Got assigned task task.zkfc.namenode.NameNodeExecutor.1446590536443 for framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:42:16.49778 I1103 22:42:16.497750  1664 slave.cpp:1254] Launching task task.namenode.namenode.NameNodeExecutor.1446590536443 for framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:42:16.50457 I1103 22:42:16.504530  1664 slave.cpp:4208] Launching executor executor.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 in work directory '/tmp/slaves/20151103-202237-169978048-5050-1873-S1/frameworks/20151103-221029-169978048-5050-901-0000/executors/executor.namenode.NameNodeExecutor.1446590536443/runs/52c51d3b-854a-47f6-936e-bde752d080d8'
2015-11-03_22:42:16.50615 I1103 22:42:16.505013  1666 docker.cpp:772] No container info found, skipping launch
2015-11-03_22:42:16.50622 I1103 22:42:16.505082  1664 slave.cpp:1401] Queuing task 'task.namenode.namenode.NameNodeExecutor.1446590536443' for executor executor.namenode.NameNodeExecutor.1446590536443 of framework '20151103-221029-169978048-5050-901-0000
2015-11-03_22:42:16.50626 I1103 22:42:16.505121  1664 slave.cpp:1254] Launching task task.zkfc.namenode.NameNodeExecutor.1446590536443 for framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:42:16.50629 I1103 22:42:16.505133  1666 containerizer.cpp:484] Starting container '52c51d3b-854a-47f6-936e-bde752d080d8' for executor 'executor.namenode.NameNodeExecutor.1446590536443' of framework '20151103-221029-169978048-5050-901-0000'
2015-11-03_22:42:16.50643 I1103 22:42:16.505437  1664 slave.cpp:1401] Queuing task 'task.zkfc.namenode.NameNodeExecutor.1446590536443' for executor executor.namenode.NameNodeExecutor.1446590536443 of framework '20151103-221029-169978048-5050-901-0000
2015-11-03_22:42:16.50665 I1103 22:42:16.506427  1666 launcher.cpp:130] Forked child with pid '3568' for container '52c51d3b-854a-47f6-936e-bde752d080d8'
2015-11-03_22:42:16.50669 I1103 22:42:16.506613  1666 containerizer.cpp:694] Checkpointing executor's forked pid 3568 to '/tmp/meta/slaves/20151103-202237-169978048-5050-1873-S1/frameworks/20151103-221029-169978048-5050-901-0000/executors/executor.namenode.NameNodeExecutor.1446590536443/runs/52c51d3b-854a-47f6-936e-bde752d080d8/pids/forked.pid'
2015-11-03_22:42:16.51901 I1103 22:42:16.517597  1667 fetcher.cpp:238] Fetching URIs using command '/usr/libexec/mesos/mesos-fetcher'
2015-11-03_22:42:32.46823 I1103 22:42:32.468086  1662 slave.cpp:3648] Current disk usage 5.30%. Max allowed age: 5.929105678781134days
2015-11-03_22:43:05.90824 2015-11-03 22:43:05,907:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 14ms
2015-11-03_22:43:32.47098 I1103 22:43:32.470630  1663 slave.cpp:3648] Current disk usage 5.37%. Max allowed age: 5.923800359336632days
2015-11-03_22:43:57.54303 I1103 22:43:57.542760  1667 slave.cpp:3165] Monitoring executor 'executor.namenode.NameNodeExecutor.1446590536443' of framework '20151103-221029-169978048-5050-901-0000' in container '52c51d3b-854a-47f6-936e-bde752d080d8'
2015-11-03_22:43:58.68386 I1103 22:43:58.683781  1669 slave.cpp:2164] Got registration for executor 'executor.namenode.NameNodeExecutor.1446590536443' of framework 20151103-221029-169978048-5050-901-0000 from executor(1)@192.168.33.12:41694
2015-11-03_22:43:58.68520 I1103 22:43:58.685139  1669 slave.cpp:1555] Sending queued task 'task.namenode.namenode.NameNodeExecutor.1446590536443' to executor 'executor.namenode.NameNodeExecutor.1446590536443' of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:58.68559 I1103 22:43:58.685538  1669 slave.cpp:1555] Sending queued task 'task.zkfc.namenode.NameNodeExecutor.1446590536443' to executor 'executor.namenode.NameNodeExecutor.1446590536443' of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:58.93193 I1103 22:43:58.931197  1666 slave.cpp:2531] Handling status update TASK_RUNNING (UUID: d1681770-3994-432c-a0bf-68d26b52f50f) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 from executor(1)@192.168.33.12:41694
2015-11-03_22:43:58.93198 I1103 22:43:58.931314  1666 status_update_manager.cpp:317] Received status update TASK_RUNNING (UUID: d1681770-3994-432c-a0bf-68d26b52f50f) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:58.93199 I1103 22:43:58.931509  1666 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_RUNNING (UUID: d1681770-3994-432c-a0bf-68d26b52f50f) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:58.93509 I1103 22:43:58.934931  1663 slave.cpp:2531] Handling status update TASK_RUNNING (UUID: b8f29b28-5482-4c0b-941a-95c59627999b) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 from executor(1)@192.168.33.12:41694
2015-11-03_22:43:58.93691 I1103 22:43:58.936573  1666 status_update_manager.cpp:317] Received status update TASK_RUNNING (UUID: b8f29b28-5482-4c0b-941a-95c59627999b) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:58.93693 I1103 22:43:58.936736  1666 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_RUNNING (UUID: b8f29b28-5482-4c0b-941a-95c59627999b) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:58.93741 I1103 22:43:58.937026  1663 slave.cpp:2776] Forwarding the update TASK_RUNNING (UUID: d1681770-3994-432c-a0bf-68d26b52f50f) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 to [email protected]:5050
2015-11-03_22:43:58.93742 I1103 22:43:58.937126  1663 slave.cpp:2709] Sending acknowledgement for status update TASK_RUNNING (UUID: d1681770-3994-432c-a0bf-68d26b52f50f) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 to executor(1)@192.168.33.12:41694
2015-11-03_22:43:58.94042 I1103 22:43:58.940106  1666 slave.cpp:2776] Forwarding the update TASK_RUNNING (UUID: b8f29b28-5482-4c0b-941a-95c59627999b) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 to [email protected]:5050
2015-11-03_22:43:58.94044 I1103 22:43:58.940207  1666 slave.cpp:2709] Sending acknowledgement for status update TASK_RUNNING (UUID: b8f29b28-5482-4c0b-941a-95c59627999b) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 to executor(1)@192.168.33.12:41694
2015-11-03_22:43:59.00595 I1103 22:43:59.005162  1663 status_update_manager.cpp:389] Received status update acknowledgement (UUID: d1681770-3994-432c-a0bf-68d26b52f50f) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:59.00597 I1103 22:43:59.005221  1663 status_update_manager.hpp:346] Checkpointing ACK for status update TASK_RUNNING (UUID: d1681770-3994-432c-a0bf-68d26b52f50f) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:59.06186 I1103 22:43:59.061506  1663 status_update_manager.cpp:389] Received status update acknowledgement (UUID: b8f29b28-5482-4c0b-941a-95c59627999b) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:59.06187 I1103 22:43:59.061559  1663 status_update_manager.hpp:346] Checkpointing ACK for status update TASK_RUNNING (UUID: b8f29b28-5482-4c0b-941a-95c59627999b) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:44:06.00495 2015-11-03 22:44:06,004:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 22ms
2015-11-03_22:44:22.70103 2015-11-03 22:44:22,700:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 14ms
2015-11-03_22:44:32.47205 I1103 22:44:32.471585  1663 slave.cpp:3648] Current disk usage 5.75%. Max allowed age: 5.897817150703623days
2015-11-03_22:44:46.10124 2015-11-03 22:44:46,101:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 45ms
2015-11-03_22:45:32.47997 I1103 22:45:32.479790  1662 slave.cpp:3648] Current disk usage 5.75%. Max allowed age: 5.897784476357060days
2015-11-03_22:45:39.51854 2015-11-03 22:45:39,518:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 28ms
2015-11-03_22:46:13.02506 2015-11-03 22:46:13,024:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 140ms
2015-11-03_22:46:32.51982 I1103 22:46:32.519547  1668 slave.cpp:3648] Current disk usage 5.75%. Max allowed age: 5.897743988579803days
2015-11-03_22:46:46.44861 2015-11-03 22:46:46,448:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 60ms
2015-11-03_22:47:32.52039 I1103 22:47:32.520289  1669 slave.cpp:3648] Current disk usage 5.75%. Max allowed age: 5.897699949243137days
2015-11-03_22:48:26.27921 I1103 22:48:26.278497  1663 slave.cpp:2531] Handling status update TASK_KILLED (UUID: 4ec06aaf-2c53-4593-8ecb-0d041c2f0437) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 from executor(1)@192.168.33.12:41694
2015-11-03_22:48:26.27924 I1103 22:48:26.278674  1663 slave.cpp:2531] Handling status update TASK_KILLED (UUID: 503988d5-0d01-456e-bf21-9779ced4659a) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 from executor(1)@192.168.33.12:41694
2015-11-03_22:48:26.27925 I1103 22:48:26.278935  1663 status_update_manager.cpp:317] Received status update TASK_KILLED (UUID: 4ec06aaf-2c53-4593-8ecb-0d041c2f0437) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.27926 I1103 22:48:26.278962  1663 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_KILLED (UUID: 4ec06aaf-2c53-4593-8ecb-0d041c2f0437) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.28469 I1103 22:48:26.283834  1663 status_update_manager.cpp:317] Received status update TASK_KILLED (UUID: 503988d5-0d01-456e-bf21-9779ced4659a) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.28472 I1103 22:48:26.283860  1663 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_KILLED (UUID: 503988d5-0d01-456e-bf21-9779ced4659a) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.28473 I1103 22:48:26.284127  1667 slave.cpp:2776] Forwarding the update TASK_KILLED (UUID: 4ec06aaf-2c53-4593-8ecb-0d041c2f0437) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 to [email protected]:5050
2015-11-03_22:48:26.28473 I1103 22:48:26.284198  1667 slave.cpp:2709] Sending acknowledgement for status update TASK_KILLED (UUID: 4ec06aaf-2c53-4593-8ecb-0d041c2f0437) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 to executor(1)@192.168.33.12:41694
2015-11-03_22:48:26.28693 I1103 22:48:26.286715  1663 slave.cpp:2776] Forwarding the update TASK_KILLED (UUID: 503988d5-0d01-456e-bf21-9779ced4659a) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 to [email protected]:5050
2015-11-03_22:48:26.28694 I1103 22:48:26.286774  1663 slave.cpp:2709] Sending acknowledgement for status update TASK_KILLED (UUID: 503988d5-0d01-456e-bf21-9779ced4659a) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 to executor(1)@192.168.33.12:41694
2015-11-03_22:48:26.31790 I1103 22:48:26.317584  1663 status_update_manager.cpp:389] Received status update acknowledgement (UUID: 4ec06aaf-2c53-4593-8ecb-0d041c2f0437) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.31796 I1103 22:48:26.317630  1663 status_update_manager.hpp:346] Checkpointing ACK for status update TASK_KILLED (UUID: 4ec06aaf-2c53-4593-8ecb-0d041c2f0437) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.34322 I1103 22:48:26.342957  1663 status_update_manager.cpp:389] Received status update acknowledgement (UUID: 503988d5-0d01-456e-bf21-9779ced4659a) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.34328 I1103 22:48:26.342993  1663 status_update_manager.hpp:346] Checkpointing ACK for status update TASK_KILLED (UUID: 503988d5-0d01-456e-bf21-9779ced4659a) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.65705 I1103 22:48:26.656960  1667 containerizer.cpp:1123] Executor for container '52c51d3b-854a-47f6-936e-bde752d080d8' has exited
2015-11-03_22:48:26.65721 I1103 22:48:26.657150  1667 containerizer.cpp:918] Destroying container '52c51d3b-854a-47f6-936e-bde752d080d8'
2015-11-03_22:48:26.71774 I1103 22:48:26.717645  1667 slave.cpp:3223] Executor 'executor.namenode.NameNodeExecutor.1446590536443' of framework 20151103-221029-169978048-5050-901-0000 exited with status 3
2015-11-03_22:48:26.71815 I1103 22:48:26.718092  1667 slave.cpp:3332] Cleaning up executor 'executor.namenode.NameNodeExecutor.1446590536443' of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.71883 I1103 22:48:26.718670  1665 gc.cpp:56] Scheduling '/tmp/slaves/20151103-202237-169978048-5050-1873-S1/frameworks/20151103-221029-169978048-5050-901-0000/executors/executor.namenode.NameNodeExecutor.1446590536443/runs/52c51d3b-854a-47f6-936e-bde752d080d8' for gc 6.99999168454815days in the future
2015-11-03_22:48:26.71885 I1103 22:48:26.718761  1665 gc.cpp:56] Scheduling '/tmp/slaves/20151103-202237-169978048-5050-1873-S1/frameworks/20151103-221029-169978048-5050-901-0000/executors/executor.namenode.NameNodeExecutor.1446590536443' for gc 6.99999168194667days in the future
2015-11-03_22:48:26.71887 I1103 22:48:26.718811  1665 gc.cpp:56] Scheduling '/tmp/meta/slaves/20151103-202237-169978048-5050-1873-S1/frameworks/20151103-221029-169978048-5050-901-0000/executors/executor.namenode.NameNodeExecutor.1446590536443/runs/52c51d3b-854a-47f6-936e-bde752d080d8' for gc 6.99999168142519days in the future
2015-11-03_22:48:26.71894 I1103 22:48:26.718849  1665 gc.cpp:56] Scheduling '/tmp/meta/slaves/20151103-202237-169978048-5050-1873-S1/frameworks/20151103-221029-169978048-5050-901-0000/executors/executor.namenode.NameNodeExecutor.1446590536443' for gc 6.99999168104days in the future

I think you are right that the Banno/vagrant-mesos may has some bugs. The mesos-slave runs task in docker container. I checked history issues, and it might not work. If you could share your vagrantfiles, I would be very grateful. I have tried another one listed on mesosphere which used chef, since I'm not familiar with chef, I prefer one which only has shell provisioners.

Nov 03 '15 23:11 n0ndual

I see this [docker.cpp:772] No container info found, skipping launch. Seems to be a bug in the vagrant file or somewhere in the set up because tasks are not finding the containers. One possibility is that the --executor_registration_timeout slave flag is not set to be high enough.

As far as different vagrant files, there are a number of them that we use. ContainerSolutions minimesos is one, https://github.com/mesosphere/playa-mesos is another, and there is another for DCOS, https://downloads.mesosphere.com/dcos/alpha/make_dcos_vagrant.sh. Some of these need to be modified to have enough slaves to run HDFS. To test, we also use DCOS on AWS or create our own Mesos cluster in the cloud or on prem.

Nov 04 '15 00:11 elingg

@elingg hi, I tried another vagrantfile, but i still stuck in the namenode state. here is the new errors from namenode executor. the hosts hotname fies are all right. Mesos is 0.22.1, jdk is openjdk-7-jdk.

FO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/11/05 06:17:04 INFO namenode.NameNode: createNameNode [-bootstrapStandby, -force]
15/11/05 06:17:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/11/05 06:17:05 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:05 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:05 INFO ha.HealthMonitor: Entering state SERVICE_NOT_RESPONDING
15/11/05 06:17:05 INFO ha.ZKFailoverController: Local service NameNode at mesos-slave1/192.168.33.11:50071 entered state: SERVICE_NOT_RESPONDING
15/11/05 06:17:05 INFO ha.ZKFailoverController: Quitting master election for NameNode at mesos-slave1/192.168.33.11:50071 and marking that fencing is necessary
15/11/05 06:17:05 INFO ha.ActiveStandbyElector: Yielding from election
15/11/05 06:17:05 INFO zookeeper.ClientCnxn: EventThread shut down
15/11/05 06:17:05 INFO zookeeper.ZooKeeper: Session: 0x150d63e8292001f closed
15/11/05 06:17:07 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:07 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:09 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:09 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:11 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:11 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:13 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:13 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:13 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:14 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:15 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:15 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:15 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:16 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:17 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:17 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:17 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:18 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:19 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:19 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:19 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:20 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:21 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:21 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:21 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:22 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:22 FATAL ha.BootstrapStandby: Unable to fetch namespace information from active NN at mesos-slave2/192.168.33.12:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave2:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:22 INFO util.ExitUtil: Exiting with status 2
15/11/05 06:17:22 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at mesos-slave1/192.168.33.11
************************************************************/

Nov 05 '15 06:11 n0ndual

anyone here, please help me! I've been working on this for 4 days and still not working.

Nov 05 '15 19:11 n0ndual

Looks like a networking error. Another thought is make sure you are pointing to the correct ip's for your mesos.hdfs.zkfc.ha.zookeeper.quorum. If that is not correct, you may not be able to connect to your zkfc.

I do have a very easy option for you though. Install DCOS and you can install HDFS in a single command dcos package install hdfs Here is the DCOS link. https://mesosphere.com/product/

Nov 06 '15 18:11 elingg

try with mesos-dns?

Nov 19 '15 07:11 theresiasnow

Hello, We have similar problem to the one described above. We use 3 AWS EC2 instances for HDFS. Vagrant is not used. OS: Centos7 java version "1.8.0_77" Memory 16G Our mesos-site.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>
    <name>mesos.hdfs.constraints</name>
    <value>role:hdfs</value>
  </property>

  <property>
    <name>mesos.hdfs.data.dir</name>
    <description>The primary data directory in HDFS</description>
    <value>/workspace/hdfs/data/data</value>
  </property>

  <!-- Uncomment this to enable secondary data dir
  <property>
    <name>mesos.hdfs.secondary.data.dir</name>
    <description>The secondary data directory in HDFS</description>
    <value>/workspace/hdfs/data2</value>
  </property>
  -->

  <property>
    <name>mesos.hdfs.domain.socket.dir</name>
    <description>The location used for a local socket used by the data nodes</description>
    <value>/var/run/hadoop-hdfs</value>
  </property>

  <!-- Uncomment this to enable backup
  <property>
    <name>mesos.hdfs.backup.dir</name>
    <description>Backup dir for HDFS</description>
    <value>/tmp/nfs</value>
  </property>
  -->

  <property>
    <name>mesos.hdfs.native-hadoop-binaries</name>
    <description>Mark true if you have hadoop pre-installed on your host machines (otherwise it will be distributed by the scheduler)</description>
    <value>false</value>
  </property>

  <property>
    <name>mesos.hdfs.framework.mnt.path</name>
    <description>Mount location (if mesos.hdfs.native-hadoop-binaries is marked false)</description>
    <value>/opt/mesosphere</value>
  </property>

  <property>
    <name>mesos.hdfs.state.zk</name>
    <description>Comma-separated hostname-port pairs of zookeeper node locations for HDFS framework state information</description>
    <value>mesosmaster01.rosen.local:2181,mesosmaster02.rosen.local:2181,mesosmaster03.rosen.local:2181</value>
  </property>

  <property>
    <name>mesos.master.uri</name>
    <description>Zookeeper entry for mesos master location</description>
    <value>zk://mesosmaster01.rosen.local:2181,mesosmaster02.rosen.local:2181,mesosmaster03.rosen.local:2181/mesos</value>
  </property>

  <property>
    <name>mesos.hdfs.zkfc.ha.zookeeper.quorum</name>
    <description>Comma-separated list of zookeeper hostname-port pairs for HDFS HA features</description>
    <value>mesosmaster01.rosen.local:2181,mesosmaster02.rosen.local:2181,mesosmaster03.rosen.local:2181</value>
  </property>

  <property>
    <name>mesos.hdfs.framework.name</name>
    <description>Your Mesos framework name and cluster name when accessing files (hdfs://YOUR_NAME)</description>
    <value>hdfs</value>
  </property>

  <property>
    <name>mesos.hdfs.mesosdns</name>
    <description>Whether to use Mesos DNS for service discovery within HDFS</description>
    <value>true</value>
  </property>

  <property>
    <name>mesos.hdfs.mesosdns.domain</name>
    <description>Root domain name of Mesos DNS (usually 'mesos')</description>
    <value>rosen.local</value>
  </property>

  <property>
    <name>mesos.native.library</name>
    <description>Location of libmesos.so</description>
    <value>/usr/local/lib/libmesos.so</value>
  </property>

  <property>
    <name>mesos.hdfs.journalnode.count</name>
    <description>Number of journal nodes (must be odd)</description>
    <value>1</value>
  </property>

  <property>
    <name>mesos.hdfs.executor.cpus</name>
    <value>0.2</value>
  </property>

  <property>
    <name>mesos.hdfs.namenode.cpus</name>
    <value>0.2</value>
  </property>

  <property>
    <name>mesos.hdfs.journalnode.cpus</name>
    <value>0.2</value>
  </property>

  <property>
    <name>mesos.hdfs.datanode.cpus</name>
    <value>0.2</value>
  </property>

  <property>
    <name>mesos.hdfs.namenode.heap.size</name>
    <value>2048</value>
  </property>

  <!-- Additional settings for fine-tuning -->
  <!--
  <property>
    <name>mesos.hdfs.jvm.overhead</name>
    <description>Multiplier on resources reserved in order to account for JVM allocation</description>
    <value>1.35</value>
  </property>

  <property>
    <name>mesos.hdfs.hadoop.heap.size</name>
    <value>512</value>
  </property>

  <property>
    <name>mesos.hdfs.namenode.heap.size</name>
    <value>4096</value>
  </property>

  <property>
    <name>mesos.hdfs.datanode.heap.size</name>
    <value>1024</value>
  </property>

  <property>
    <name>mesos.hdfs.executor.heap.size</name>
    <value>256</value>
  </property>

  <property>
    <name>mesos.hdfs.executor.cpus</name>
    <value>0.5</value>
  </property>

  <property>
    <name>mesos.hdfs.namenode.cpus</name>
    <value>1</value>
  </property>

  <property>
    <name>mesos.hdfs.journalnode.cpus</name>
    <value>1</value>
  </property>

  <property>
    <name>mesos.hdfs.datanode.cpus</name>
    <value>1</value>
  </property>

  <property>
    <name>mesos.hdfs.user</name>
    <value>root</value>
  </property>

  <property>
    <name>mesos.hdfs.role</name>
    <value>*</value>
  </property>

  <property>    
    <name>mesos.hdfs.datanode.exclusive</name>
    WARNING-It is not advisable to run the datanode on same slave because of performance issues
    <description>Whether to run the datanode on slave different from namenode and journal nodes</description>
    <value>true</value>
  </property>
  -->
</configuration>

mesos-slave log

I0412 10:03:47.383617  1128 slave.cpp:2248] Updating framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 pid to [email protected]:59562
I0412 10:03:47.383980  1128 status_update_manager.cpp:181] Resuming sending status updates
I0412 10:03:52.571465  1129 slave.cpp:1361] Got assigned task task.namenode.namenode.NameNodeExecutor.1460455432500 for framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:52.572115  1129 slave.cpp:1361] Got assigned task task.zkfc.namenode.NameNodeExecutor.1460455432500 for framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:52.572258  1129 slave.cpp:1480] Launching task task.namenode.namenode.NameNodeExecutor.1460455432500 for framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:52.577028  1129 paths.cpp:528] Trying to chown '/workspace/slaves/835221b1-c248-4c7a-b641-6b139b37a855-S37/frameworks/835221b1-c248-4c7a-b641-6b139b37a855-0002/executors/executor.namenode.NameNodeExecutor.1460455432500/runs/b736fd9e-713b-4ce3-8f6f-72489cce55c4' to user 'root'
I0412 10:03:52.587659  1129 slave.cpp:5367] Launching executor executor.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 with resources cpus(*):0.2; mem(*):345.6 in work directory '/workspace/slaves/835221b1-c248-4c7a-b641-6b139b37a855-S37/frameworks/835221b1-c248-4c7a-b641-6b139b37a855-0002/executors/executor.namenode.NameNodeExecutor.1460455432500/runs/b736fd9e-713b-4ce3-8f6f-72489cce55c4'
I0412 10:03:52.589581  1129 slave.cpp:1698] Queuing task 'task.namenode.namenode.NameNodeExecutor.1460455432500' for executor 'executor.namenode.NameNodeExecutor.1460455432500' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:52.589629  1129 slave.cpp:1480] Launching task task.zkfc.namenode.NameNodeExecutor.1460455432500 for framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:52.589805  1129 slave.cpp:1698] Queuing task 'task.zkfc.namenode.NameNodeExecutor.1460455432500' for executor 'executor.namenode.NameNodeExecutor.1460455432500' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:52.590296  1128 docker.cpp:1009] No container info found, skipping launch
I0412 10:03:52.590457  1133 containerizer.cpp:666] Starting container 'b736fd9e-713b-4ce3-8f6f-72489cce55c4' for executor 'executor.namenode.NameNodeExecutor.1460455432500' of framework '835221b1-c248-4c7a-b641-6b139b37a855-0002'
I0412 10:03:52.592732  1132 linux_launcher.cpp:304] Cloning child process with flags = 
I0412 10:03:52.593842  1132 systemd.cpp:95] Assigned child process '20366' to 'mesos_executors.slice'
I0412 10:03:52.595650  1132 containerizer.cpp:1118] Checkpointing executor's forked pid 20366 to '/workspace/meta/slaves/835221b1-c248-4c7a-b641-6b139b37a855-S37/frameworks/835221b1-c248-4c7a-b641-6b139b37a855-0002/executors/executor.namenode.NameNodeExecutor.1460455432500/runs/b736fd9e-713b-4ce3-8f6f-72489cce55c4/pids/forked.pid'
I0412 10:03:56.420516  1130 slave.cpp:2643] Got registration for executor 'executor.namenode.NameNodeExecutor.1460455432500' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:51595
I0412 10:03:56.421440  1130 slave.cpp:1863] Sending queued task 'task.namenode.namenode.NameNodeExecutor.1460455432500' to executor 'executor.namenode.NameNodeExecutor.1460455432500' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 at executor(1)@172.32.2.201:51595
I0412 10:03:56.421576  1130 slave.cpp:1863] Sending queued task 'task.zkfc.namenode.NameNodeExecutor.1460455432500' to executor 'executor.namenode.NameNodeExecutor.1460455432500' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 at executor(1)@172.32.2.201:51595
I0412 10:03:56.629096  1129 slave.cpp:3002] Handling status update TASK_RUNNING (UUID: f3723e38-75ab-40d1-84ee-5e5e87066460) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:51595
I0412 10:03:56.629263  1129 slave.cpp:3002] Handling status update TASK_RUNNING (UUID: 2e23a614-1365-434d-8149-3bbe82689458) for task task.zkfc.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:51595
I0412 10:03:56.629564  1126 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: f3723e38-75ab-40d1-84ee-5e5e87066460) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:56.629753  1126 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_RUNNING (UUID: f3723e38-75ab-40d1-84ee-5e5e87066460) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:56.637223  1128 slave.cpp:3400] Forwarding the update TASK_RUNNING (UUID: f3723e38-75ab-40d1-84ee-5e5e87066460) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to [email protected]:5050
I0412 10:03:56.637322  1128 slave.cpp:3310] Sending acknowledgement for status update TASK_RUNNING (UUID: f3723e38-75ab-40d1-84ee-5e5e87066460) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to executor(1)@172.32.2.201:51595
I0412 10:03:56.637621  1126 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 2e23a614-1365-434d-8149-3bbe82689458) for task task.zkfc.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:56.637817  1126 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_RUNNING (UUID: 2e23a614-1365-434d-8149-3bbe82689458) for task task.zkfc.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:56.641366  1130 slave.cpp:3400] Forwarding the update TASK_RUNNING (UUID: 2e23a614-1365-434d-8149-3bbe82689458) for task task.zkfc.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to [email protected]:5050
I0412 10:03:56.641475  1130 slave.cpp:3310] Sending acknowledgement for status update TASK_RUNNING (UUID: 2e23a614-1365-434d-8149-3bbe82689458) for task task.zkfc.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to executor(1)@172.32.2.201:51595
I0412 10:03:56.667753  1133 status_update_manager.cpp:392] Received status update acknowledgement (UUID: f3723e38-75ab-40d1-84ee-5e5e87066460) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:56.667856  1133 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING (UUID: f3723e38-75ab-40d1-84ee-5e5e87066460) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:56.695082  1133 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 2e23a614-1365-434d-8149-3bbe82689458) for task task.zkfc.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:56.695195  1133 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING (UUID: 2e23a614-1365-434d-8149-3bbe82689458) for task task.zkfc.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:57.437192  1133 slave.cpp:4374] Current disk usage 13.61%. Max allowed age: 5.347110459506319days
I0412 10:03:58.121980  1126 slave.cpp:3002] Handling status update TASK_FAILED (UUID: 05daa2cf-ce01-41a7-b6f2-12d218f67330) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:51595
I0412 10:03:58.122598  1130 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 05daa2cf-ce01-41a7-b6f2-12d218f67330) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.122643  1130 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 05daa2cf-ce01-41a7-b6f2-12d218f67330) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.131938  1126 slave.cpp:3400] Forwarding the update TASK_FAILED (UUID: 05daa2cf-ce01-41a7-b6f2-12d218f67330) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to [email protected]:5050
I0412 10:03:58.132037  1126 slave.cpp:3310] Sending acknowledgement for status update TASK_FAILED (UUID: 05daa2cf-ce01-41a7-b6f2-12d218f67330) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to executor(1)@172.32.2.201:51595
I0412 10:03:58.154698  1133 slave.cpp:3002] Handling status update TASK_RUNNING (UUID: 12c08871-27cd-4b90-9b51-7d1ef9d99801) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:51595
I0412 10:03:58.155057  1127 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 12c08871-27cd-4b90-9b51-7d1ef9d99801) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.155128  1127 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_RUNNING (UUID: 12c08871-27cd-4b90-9b51-7d1ef9d99801) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.159402  1127 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 05daa2cf-ce01-41a7-b6f2-12d218f67330) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.159435  1133 slave.cpp:3310] Sending acknowledgement for status update TASK_RUNNING (UUID: 12c08871-27cd-4b90-9b51-7d1ef9d99801) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to executor(1)@172.32.2.201:51595
I0412 10:03:58.159457  1127 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 05daa2cf-ce01-41a7-b6f2-12d218f67330) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.560004  1130 slave.cpp:1361] Got assigned task task.namenode.namenode.NameNodeExecutor.1460455438505 for framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.560318  1130 slave.cpp:1361] Got assigned task task.zkfc.namenode.NameNodeExecutor.1460455438505 for framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.560469  1130 slave.cpp:1480] Launching task task.namenode.namenode.NameNodeExecutor.1460455438505 for framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.560714  1130 paths.cpp:528] Trying to chown '/workspace/slaves/835221b1-c248-4c7a-b641-6b139b37a855-S37/frameworks/835221b1-c248-4c7a-b641-6b139b37a855-0002/executors/executor.namenode.NameNodeExecutor.1460455438505/runs/0fcf4e38-32b4-40cb-a8b7-e810ea95c7fa' to user 'root'
I0412 10:03:58.564749  1130 slave.cpp:5367] Launching executor executor.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 with resources cpus(*):0.2; mem(*):345.6 in work directory '/workspace/slaves/835221b1-c248-4c7a-b641-6b139b37a855-S37/frameworks/835221b1-c248-4c7a-b641-6b139b37a855-0002/executors/executor.namenode.NameNodeExecutor.1460455438505/runs/0fcf4e38-32b4-40cb-a8b7-e810ea95c7fa'
I0412 10:03:58.565224  1130 slave.cpp:1698] Queuing task 'task.namenode.namenode.NameNodeExecutor.1460455438505' for executor 'executor.namenode.NameNodeExecutor.1460455438505' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.565279  1130 slave.cpp:1480] Launching task task.zkfc.namenode.NameNodeExecutor.1460455438505 for framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.565349  1127 docker.cpp:1009] No container info found, skipping launch
I0412 10:03:58.565481  1127 containerizer.cpp:666] Starting container '0fcf4e38-32b4-40cb-a8b7-e810ea95c7fa' for executor 'executor.namenode.NameNodeExecutor.1460455438505' of framework '835221b1-c248-4c7a-b641-6b139b37a855-0002'
I0412 10:03:58.565546  1130 slave.cpp:1698] Queuing task 'task.zkfc.namenode.NameNodeExecutor.1460455438505' for executor 'executor.namenode.NameNodeExecutor.1460455438505' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.567698  1127 linux_launcher.cpp:304] Cloning child process with flags = 
I0412 10:03:58.569170  1127 systemd.cpp:95] Assigned child process '20614' to 'mesos_executors.slice'
I0412 10:03:58.570055  1127 containerizer.cpp:1118] Checkpointing executor's forked pid 20614 to '/workspace/meta/slaves/835221b1-c248-4c7a-b641-6b139b37a855-S37/frameworks/835221b1-c248-4c7a-b641-6b139b37a855-0002/executors/executor.namenode.NameNodeExecutor.1460455438505/runs/0fcf4e38-32b4-40cb-a8b7-e810ea95c7fa/pids/forked.pid'
I0412 10:04:02.514757  1131 slave.cpp:2643] Got registration for executor 'executor.namenode.NameNodeExecutor.1460455438505' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:40841
I0412 10:04:02.515738  1131 slave.cpp:1863] Sending queued task 'task.namenode.namenode.NameNodeExecutor.1460455438505' to executor 'executor.namenode.NameNodeExecutor.1460455438505' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 at executor(1)@172.32.2.201:40841
I0412 10:04:02.515851  1131 slave.cpp:1863] Sending queued task 'task.zkfc.namenode.NameNodeExecutor.1460455438505' to executor 'executor.namenode.NameNodeExecutor.1460455438505' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 at executor(1)@172.32.2.201:40841
I0412 10:04:02.716392  1129 slave.cpp:3002] Handling status update TASK_RUNNING (UUID: d207da47-d11a-47dd-b69a-d4ba54465daf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:40841
I0412 10:04:02.716552  1129 slave.cpp:3002] Handling status update TASK_RUNNING (UUID: 50c6c76a-40d6-4a5f-9c78-68eac6b0f5f9) for task task.zkfc.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:40841
I0412 10:04:02.716835  1131 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: d207da47-d11a-47dd-b69a-d4ba54465daf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:02.717103  1131 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_RUNNING (UUID: d207da47-d11a-47dd-b69a-d4ba54465daf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:02.725347  1129 slave.cpp:3400] Forwarding the update TASK_RUNNING (UUID: d207da47-d11a-47dd-b69a-d4ba54465daf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to [email protected]:5050
I0412 10:04:02.725388  1131 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 50c6c76a-40d6-4a5f-9c78-68eac6b0f5f9) for task task.zkfc.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:02.725458  1129 slave.cpp:3310] Sending acknowledgement for status update TASK_RUNNING (UUID: d207da47-d11a-47dd-b69a-d4ba54465daf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to executor(1)@172.32.2.201:40841
I0412 10:04:02.725586  1131 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_RUNNING (UUID: 50c6c76a-40d6-4a5f-9c78-68eac6b0f5f9) for task task.zkfc.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:02.729430  1127 slave.cpp:3400] Forwarding the update TASK_RUNNING (UUID: 50c6c76a-40d6-4a5f-9c78-68eac6b0f5f9) for task task.zkfc.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to [email protected]:5050
I0412 10:04:02.729521  1127 slave.cpp:3310] Sending acknowledgement for status update TASK_RUNNING (UUID: 50c6c76a-40d6-4a5f-9c78-68eac6b0f5f9) for task task.zkfc.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to executor(1)@172.32.2.201:40841
I0412 10:04:02.755759  1129 status_update_manager.cpp:392] Received status update acknowledgement (UUID: d207da47-d11a-47dd-b69a-d4ba54465daf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:02.755856  1129 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING (UUID: d207da47-d11a-47dd-b69a-d4ba54465daf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:02.798437  1129 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 50c6c76a-40d6-4a5f-9c78-68eac6b0f5f9) for task task.zkfc.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:02.798527  1129 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING (UUID: 50c6c76a-40d6-4a5f-9c78-68eac6b0f5f9) for task task.zkfc.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:04.234256  1130 slave.cpp:3002] Handling status update TASK_FAILED (UUID: 5e6ef534-21e2-4b4d-b9a7-0c79517c42bf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:40841
I0412 10:04:04.234877  1130 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 5e6ef534-21e2-4b4d-b9a7-0c79517c42bf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:04.234915  1130 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 5e6ef534-21e2-4b4d-b9a7-0c79517c42bf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:04.240129  1128 slave.cpp:3400] Forwarding the update TASK_FAILED (UUID: 5e6ef534-21e2-4b4d-b9a7-0c79517c42bf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to [email protected]:5050
I0412 10:04:04.240219  1128 slave.cpp:3310] Sending acknowledgement for status update TASK_FAILED (UUID: 5e6ef534-21e2-4b4d-b9a7-0c79517c42bf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to executor(1)@172.32.2.201:40841
I0412 10:04:04.254961  1130 slave.cpp:3002] Handling status update TASK_RUNNING (UUID: 0f54ea2b-b8e1-4e96-b386-b2ae9732305a) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:40841
I0412 10:04:04.255317  1130 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 0f54ea2b-b8e1-4e96-b386-b2ae9732305a) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:04.255357  1130 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_RUNNING (UUID: 0f54ea2b-b8e1-4e96-b386-b2ae9732305a) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:04.259516  1130 slave.cpp:3310] Sending acknowledgement for status update TASK_RUNNING (UUID: 0f54ea2b-b8e1-4e96-b386-b2ae9732305a) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to executor(1)@172.32.2.201:40841
I0412 10:04:04.274070  1129 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 5e6ef534-21e2-4b4d-b9a7-0c79517c42bf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:04.274127  1129 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 5e6ef534-21e2-4b4d-b9a7-0c79517c42bf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:57.438268  1129 slave.cpp:4374] Current disk usage 13.93%. Max allowed age: 5.324971961824769days

hdfs-mesos output

10:03:47.303 [main] INFO  org.eclipse.jetty.util.log - Logging initialized @604ms
10:03:47.337 [main] INFO  org.eclipse.jetty.server.Server - jetty-9.2.z-SNAPSHOT
10:03:47.357 [HdfsScheduler] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - user: "root"
name: "hdfs"
id {
  value: "835221b1-c248-4c7a-b641-6b139b37a855-0002"
}
failover_timeout: 3.14496E7
checkpoint: true
role: "*"

10:03:47.357 [HdfsScheduler] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Registering without authentication
10:03:47.371 [main] INFO  o.e.jetty.server.ServerConnector - Started ServerConnector@3eb7fc54{HTTP/1.1}{0.0.0.0:8765}
10:03:47.371 [main] INFO  org.eclipse.jetty.server.Server - Started @677ms
10:03:47.387 [Thread-11] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Registered framework frameworkId=835221b1-c248-4c7a-b641-6b139b37a855-0002
10:03:47.387 [Thread-11] INFO  o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'RECONCILING_TASKS'
10:03:47.389 [Thread-12] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - =========================================
10:03:47.389 [Thread-12] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - pendingTasks size: 1
10:03:47.389 [Thread-12] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - task.journalnode.journalnode.NodeExecutor.1460452855798
10:03:47.389 [Thread-12] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - =========================================
10:03:47.389 [Thread-12] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Explicitly Reconciling Tasks
10:03:47.398 [Thread-12] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Implicitly Reconciling Tasks
10:03:47.398 [Thread-12] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sleeping for 4000ms before retrying reconciliation.
10:03:47.409 [Thread-13] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 6 offers
10:03:47.411 [Thread-13] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50699"

10:03:47.411 [Thread-13] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50700"

10:03:47.411 [Thread-13] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50701"

10:03:47.411 [Thread-13] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50702"

10:03:47.411 [Thread-13] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50703"

10:03:47.412 [Thread-13] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50704"

10:03:47.440 [Thread-13] INFO  o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 0/2
10:03:47.440 [Thread-13] INFO  o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'RECONCILING_TASKS'
10:03:47.440 [Thread-13] INFO  o.a.mesos.hdfs.state.StateMachine - Current phase is now: RECONCILING_TASKS
10:03:47.444 [Thread-13] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Scheduler in phase: RECONCILING_TASKS, declining offer: value: "835221b1-c248-4c7a-b641-6b139b37a855-O50702"

10:03:47.446 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.journalnode.journalnode.NodeExecutor.1460452855798 state=TASK_RUNNING message='Reconciliation: Latest task state'
10:03:47.449 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
  value: "task.journalnode.journalnode.NodeExecutor.1460452855798"
}
state: TASK_RUNNING
message: "Reconciliation: Latest task state"
slave_id {
  value: "835221b1-c248-4c7a-b641-6b139b37a855-S37"
}
timestamp: 1.460455427439771E9
source: SOURCE_MASTER
reason: REASON_RECONCILIATION
13: "\n\036\022\f172.32.2.201*\016\022\f172.32.2.201"

10:03:47.459 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.journalnode.journalnode.NodeExecutor.1460452855798
10:03:47.459 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Reconciliation is NOT complete
10:03:47.459 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Reconciling Task 'task.journalnode.journalnode.NodeExecutor.1460452855798'.
10:03:47.459 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - =========================================
10:03:47.459 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - pendingTasks size: 0
10:03:47.459 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - =========================================
10:03:47.459 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Reconciliation is complete
10:03:47.462 [Thread-14] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1460452855798, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:47.472 [Thread-14] INFO  o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 0/2
10:03:47.472 [Thread-14] INFO  o.a.mesos.hdfs.state.StateMachine - Transitioning from acquisition phase 'RECONCILING_TASKS' to 'NAME_NODES'
10:03:47.472 [Thread-14] INFO  o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
10:03:47.472 [Thread-15] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.journalnode.journalnode.NodeExecutor.1460452855798 state=TASK_RUNNING message='Reconciliation: Latest task state'
10:03:47.473 [Thread-15] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
  value: "task.journalnode.journalnode.NodeExecutor.1460452855798"
}
state: TASK_RUNNING
message: "Reconciliation: Latest task state"
slave_id {
  value: "835221b1-c248-4c7a-b641-6b139b37a855-S37"
}
timestamp: 1.460455427440362E9
source: SOURCE_MASTER
reason: REASON_RECONCILIATION
13: "\n\036\022\f172.32.2.201*\016\022\f172.32.2.201"

10:03:47.479 [Thread-15] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.journalnode.journalnode.NodeExecutor.1460452855798
10:03:47.482 [Thread-15] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1460452855798, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:47.487 [Thread-15] INFO  o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 0/2
10:03:47.487 [Thread-15] INFO  o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
10:03:47.487 [Thread-15] INFO  o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
10:03:52.480 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 6 offers
10:03:52.481 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50705"

10:03:52.481 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50706"

10:03:52.481 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50707"

10:03:52.481 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50708"

10:03:52.481 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50709"

10:03:52.482 [Thread-16] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50710"

10:03:52.483 [Thread-16] INFO  o.a.m.hdfs.scheduler.NodeLauncher - Node: namenode, evaluating offer: value: "835221b1-c248-4c7a-b641-6b139b37a855-O50705"

10:03:52.484 [Thread-16] INFO  o.apache.mesos.hdfs.util.DnsResolver - Resolving DNS for journalnode1.hdfs.rosen.local
10:03:52.486 [Thread-16] INFO  o.apache.mesos.hdfs.util.DnsResolver - Successfully found journalnode1.hdfs.rosen.local
10:03:52.500 [Thread-16] INFO  o.a.m.hdfs.scheduler.NodeLauncher - Node: namenode, accepting offer: value: "835221b1-c248-4c7a-b641-6b139b37a855-O50705"

10:03:52.523 [Thread-16] INFO  o.a.mesos.hdfs.scheduler.HdfsNode - [name: "LD_LIBRARY_PATH"
value: "/usr/local/lib"
, name: "EXECUTOR_OPTS"
value: "-Xmx256m -Xms256m"
]
10:03:52.527 [Thread-16] INFO  o.a.mesos.hdfs.scheduler.HdfsNode - [name: "LD_LIBRARY_PATH"
value: "/usr/local/lib"
, name: "EXECUTOR_OPTS"
value: "-Xmx256m -Xms256m"
]
10:03:52.528 [Thread-16] WARN  o.apache.mesos.hdfs.state.HdfsState - Deserialization failed (as expected when setting status for the first time) with exception: java.io.EOFException
10:03:52.528 [Thread-16] INFO  o.apache.mesos.hdfs.state.HdfsState - Setting new status: null
10:03:52.544 [Thread-16] WARN  o.apache.mesos.hdfs.state.HdfsState - Deserialization failed (as expected when setting status for the first time) with exception: java.io.EOFException
10:03:52.544 [Thread-16] INFO  o.apache.mesos.hdfs.state.HdfsState - Setting new status: null
10:03:56.627 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.namenode.namenode.NameNodeExecutor.1460455432500 state=TASK_RUNNING message=''
10:03:56.629 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
  value: "task.namenode.namenode.NameNodeExecutor.1460455432500"
}
state: TASK_RUNNING
slave_id {
  value: "835221b1-c248-4c7a-b641-6b139b37a855-S37"
}
timestamp: 1.460455436628709E9
executor_id {
  value: "executor.namenode.NameNodeExecutor.1460455432500"
}
source: SOURCE_EXECUTOR
uuid: "\363r>8u\253@\321\204\356^^\207\006d`"
labels {
  labels {
    key: "status"
    value: "uninitialized"
  }
}
13: "\n\036\022\f172.32.2.201*\016\022\f172.32.2.201"

10:03:56.636 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.namenode.namenode.NameNodeExecutor.1460455432500
10:03:56.641 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1460452855798, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:56.641 [Thread-17] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.namenode.namenode.NameNodeExecutor.1460455432500, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:56.654 [Thread-17] INFO  o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 1/2
10:03:56.654 [Thread-17] INFO  o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
10:03:56.654 [Thread-17] INFO  o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
10:03:56.654 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.zkfc.namenode.NameNodeExecutor.1460455432500 state=TASK_RUNNING message=''
10:03:56.655 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
  value: "task.zkfc.namenode.NameNodeExecutor.1460455432500"
}
state: TASK_RUNNING
slave_id {
  value: "835221b1-c248-4c7a-b641-6b139b37a855-S37"
}
timestamp: 1.460455436628843E9
executor_id {
  value: "executor.namenode.NameNodeExecutor.1460455432500"
}
source: SOURCE_EXECUTOR
uuid: ".#\246\024\023eCM\201I;\276\202h\224X"
13: "\n\036\022\f172.32.2.201*\016\022\f172.32.2.201"

10:03:56.663 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.zkfc.namenode.NameNodeExecutor.1460455432500
10:03:56.668 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1460452855798, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:56.669 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.namenode.namenode.NameNodeExecutor.1460455432500, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:56.669 [Thread-18] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1460455432500, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:56.681 [Thread-18] INFO  o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 1/2
10:03:56.681 [Thread-18] INFO  o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
10:03:56.681 [Thread-18] INFO  o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
10:03:58.121 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.namenode.namenode.NameNodeExecutor.1460455432500 state=TASK_FAILED message=''
10:03:58.122 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
  value: "task.namenode.namenode.NameNodeExecutor.1460455432500"
}
state: TASK_FAILED
slave_id {
  value: "835221b1-c248-4c7a-b641-6b139b37a855-S37"
}
timestamp: 1.460455438120914E9
executor_id {
  value: "executor.namenode.NameNodeExecutor.1460455432500"
}
source: SOURCE_EXECUTOR
uuid: "\005\332\242\317\316\001A\247\266\362\022\322\030\366s0"
13: "\n\036\022\f172.32.2.201*\016\022\f172.32.2.201"

10:03:58.130 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.namenode.namenode.NameNodeExecutor.1460455432500
10:03:58.134 [qtp63001505-17] WARN  org.eclipse.jetty.server.HttpChannel - /hdfs-site.xml?nn=namenode1
java.io.IOException: java.io.EOFException
        at org.apache.mesos.hdfs.config.ConfigServer$ServeHdfsConfigHandler.handle(ConfigServer.java:108) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.eclipse.jetty.server.Server.handle(Server.java:485) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:290) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) [hdfs-scheduler-0.1.5-uber.jar:na]
        at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) [hdfs-scheduler-0.1.5-uber.jar:na]
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:606) [hdfs-scheduler-0.1.5-uber.jar:na]
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:535) [hdfs-scheduler-0.1.5-uber.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
Caused by: java.io.EOFException: null
        at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2335) ~[na:1.8.0_77]
        at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2804) ~[na:1.8.0_77]
        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:802) ~[na:1.8.0_77]
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299) ~[na:1.8.0_77]
        at org.apache.mesos.hdfs.state.Serializer.deserialize(Serializer.java:25) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.apache.mesos.hdfs.state.HdfsState.getTask(HdfsState.java:226) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.apache.mesos.hdfs.state.HdfsState.getTasks(HdfsState.java:160) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.apache.mesos.hdfs.state.HdfsState.getTasks(HdfsState.java:169) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.apache.mesos.hdfs.state.HdfsState.getJournalNodeTasks(HdfsState.java:254) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.apache.mesos.hdfs.config.ConfigServer$ServeHdfsConfigHandler.handle(ConfigServer.java:106) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        ... 9 common frames omitted
10:03:58.134 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1460452855798, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:58.135 [Thread-19] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1460455432500, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:58.145 [Thread-19] INFO  o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 0/2
10:03:58.145 [Thread-19] INFO  o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
10:03:58.145 [Thread-19] INFO  o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
10:03:58.485 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 6 offers
10:03:58.486 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50711"

10:03:58.486 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50712"

10:03:58.486 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50713"

10:03:58.487 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50714"

10:03:58.487 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50715"

10:03:58.487 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50716"

10:03:58.487 [Thread-20] INFO  o.a.m.hdfs.scheduler.NodeLauncher - Node: namenode, evaluating offer: value: "835221b1-c248-4c7a-b641-6b139b37a855-O50712"

10:03:58.487 [Thread-20] INFO  o.apache.mesos.hdfs.util.DnsResolver - Resolving DNS for journalnode1.hdfs.rosen.local
10:03:58.487 [Thread-20] INFO  o.apache.mesos.hdfs.util.DnsResolver - Successfully found journalnode1.hdfs.rosen.local
10:03:58.504 [Thread-20] INFO  o.a.m.hdfs.scheduler.NodeLauncher - Node: namenode, accepting offer: value: "835221b1-c248-4c7a-b641-6b139b37a855-O50712"

10:03:58.513 [Thread-20] INFO  o.a.mesos.hdfs.scheduler.HdfsNode - [name: "LD_LIBRARY_PATH"
value: "/usr/local/lib"
, name: "EXECUTOR_OPTS"
value: "-Xmx256m -Xms256m"
]
10:03:58.514 [Thread-20] INFO  o.a.mesos.hdfs.scheduler.HdfsNode - [name: "LD_LIBRARY_PATH"
value: "/usr/local/lib"
, name: "EXECUTOR_OPTS"
value: "-Xmx256m -Xms256m"
]
10:03:58.515 [Thread-20] WARN  o.apache.mesos.hdfs.state.HdfsState - Deserialization failed (as expected when setting status for the first time) with exception: java.io.EOFException
10:03:58.515 [Thread-20] INFO  o.apache.mesos.hdfs.state.HdfsState - Setting new status: null
10:03:58.532 [Thread-20] WARN  o.apache.mesos.hdfs.state.HdfsState - Deserialization failed (as expected when setting status for the first time) with exception: java.io.EOFException
10:03:58.532 [Thread-20] INFO  o.apache.mesos.hdfs.state.HdfsState - Setting new status: null
10:04:02.715 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.namenode.namenode.NameNodeExecutor.1460455438505 state=TASK_RUNNING message=''
10:04:02.715 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
  value: "task.namenode.namenode.NameNodeExecutor.1460455438505"
}
state: TASK_RUNNING
slave_id {
  value: "835221b1-c248-4c7a-b641-6b139b37a855-S37"
}
timestamp: 1.460455442715895E9
executor_id {
  value: "executor.namenode.NameNodeExecutor.1460455438505"
}
source: SOURCE_EXECUTOR
uuid: "\322\a\332G\321\032G\335\266\232\324\272TF]\257"
labels {
  labels {
    key: "status"
    value: "uninitialized"
  }
}
13: "\n\036\022\f172.32.2.201*\016\022\f172.32.2.201"

10:04:02.722 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.namenode.namenode.NameNodeExecutor.1460455438505
10:04:02.728 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1460452855798, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:02.728 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.namenode.namenode.NameNodeExecutor.1460455438505, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:02.728 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1460455432500, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:02.742 [Thread-21] INFO  o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 1/2
10:04:02.742 [Thread-21] INFO  o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
10:04:02.742 [Thread-21] INFO  o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
10:04:02.742 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.zkfc.namenode.NameNodeExecutor.1460455438505 state=TASK_RUNNING message=''
10:04:02.743 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
  value: "task.zkfc.namenode.NameNodeExecutor.1460455438505"
}
state: TASK_RUNNING
slave_id {
  value: "835221b1-c248-4c7a-b641-6b139b37a855-S37"
}
timestamp: 1.460455442716026E9
executor_id {
  value: "executor.namenode.NameNodeExecutor.1460455438505"
}
source: SOURCE_EXECUTOR
uuid: "P\306\307j@\326J_\234xh\352\306\260\365\371"
13: "\n\036\022\f172.32.2.201*\016\022\f172.32.2.201"

10:04:02.750 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.zkfc.namenode.NameNodeExecutor.1460455438505
10:04:02.756 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1460455438505, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:02.756 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1460452855798, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:02.756 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.namenode.namenode.NameNodeExecutor.1460455438505, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:02.756 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1460455432500, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:02.772 [Thread-22] INFO  o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 1/2
10:04:02.772 [Thread-22] INFO  o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
10:04:02.772 [Thread-22] INFO  o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
10:04:04.229 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.namenode.namenode.NameNodeExecutor.1460455438505 state=TASK_FAILED message=''
10:04:04.230 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
  value: "task.namenode.namenode.NameNodeExecutor.1460455438505"
}
state: TASK_FAILED
slave_id {
  value: "835221b1-c248-4c7a-b641-6b139b37a855-S37"
}
timestamp: 1.460455444232839E9
executor_id {
  value: "executor.namenode.NameNodeExecutor.1460455438505"
}
source: SOURCE_EXECUTOR
uuid: "^n\3654!\342KM\271\247\fyQ|B\277"
13: "\n\036\022\f172.32.2.201*\016\022\f172.32.2.201"

10:04:04.236 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.namenode.namenode.NameNodeExecutor.1460455438505
10:04:04.237 [qtp63001505-17] WARN  org.eclipse.jetty.server.HttpChannel - /hdfs-site.xml?nn=namenode1
java.io.IOException: java.io.EOFException
        at org.apache.mesos.hdfs.config.ConfigServer$ServeHdfsConfigHandler.handle(ConfigServer.java:108) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.eclipse.jetty.server.Server.handle(Server.java:485) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:290) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) [hdfs-scheduler-0.1.5-uber.jar:na]
        at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) [hdfs-scheduler-0.1.5-uber.jar:na]
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:606) [hdfs-scheduler-0.1.5-uber.jar:na]
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:535) [hdfs-scheduler-0.1.5-uber.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
Caused by: java.io.EOFException: null
        at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2335) ~[na:1.8.0_77]
        at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2804) ~[na:1.8.0_77]
        at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:802) ~[na:1.8.0_77]
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299) ~[na:1.8.0_77]
        at org.apache.mesos.hdfs.state.Serializer.deserialize(Serializer.java:25) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.apache.mesos.hdfs.state.HdfsState.getTask(HdfsState.java:226) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.apache.mesos.hdfs.state.HdfsState.getTasks(HdfsState.java:160) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.apache.mesos.hdfs.state.HdfsState.getTasks(HdfsState.java:169) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.apache.mesos.hdfs.state.HdfsState.getNameNodeTasks(HdfsState.java:249) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        at org.apache.mesos.hdfs.config.ConfigServer$ServeHdfsConfigHandler.handle(ConfigServer.java:105) ~[hdfs-scheduler-0.1.5-uber.jar:na]
        ... 9 common frames omitted
10:04:04.241 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1460455438505, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:04.241 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1460452855798, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:04.241 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1460455432500, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:04.260 [Thread-23] INFO  o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 0/2
10:04:04.260 [Thread-23] INFO  o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
10:04:04.260 [Thread-23] INFO  o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES

What is your recommendation?

Apr 12 '16 11:04 sbelyaev

hdfs-deprecated hdfs-deprecated copied to clipboard

stuck in NameNode state, NameNode started and immediately lost

hdfs-deprecated
hdfs-deprecated copied to clipboard