hdfs-deprecated
hdfs-deprecated copied to clipboard
stuck in NameNode state, NameNode started and immediately lost
I used vagrant to start a cluster which has 1 master and only 4 slaves. I pulled your vagrant project from Banno/vagrant-mesos.
But it failed to start, stuck in the NAMENODE state. The log and mesos ui shows that the NameNode started and immediately lost. And mesos tried to restart it, but it still stoped immediately after start. the stdout is
11:10:36.043 [Thread-57] INFO o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 1/2
11:10:36.045 [Thread-57] INFO o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
11:10:36.046 [Thread-57] INFO o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
11:10:36.504 [Thread-58] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
11:10:36.504 [Thread-58] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "20151103-102825-169978048-5050-873-O2462"
11:10:36.505 [Thread-58] INFO o.a.m.hdfs.scheduler.NodeLauncher - Node: namenode, evaluating offer: value: "20151103-102825-169978048-5050-873-O2462"
11:10:36.536 [Thread-58] INFO o.a.mesos.hdfs.scheduler.NameNode - Already running namenode on mesos-slave1
11:10:36.538 [Thread-58] INFO o.a.m.hdfs.scheduler.NodeLauncher - Node: namenode, declining offer: value: "20151103-102825-169978048-5050-873-O2462"
11:10:38.724 [Thread-59] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.namenode.namenode.NameNodeExecutor.1446548986442 state=TASK_FAILED message=''
11:10:38.737 [Thread-59] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
value: "task.namenode.namenode.NameNodeExecutor.1446548986442"
}
state: TASK_FAILED
slave_id {
value: "20151103-102825-169978048-5050-873-S1"
}
timestamp: 1.446549038686555E9
source: SOURCE_EXECUTOR
uuid: "\026\371*\016\351sH4\252\226\200\346\3054\002\030"
11:10:38.749 [Thread-59] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.namenode.namenode.NameNodeExecutor.1446548986442
11:10:38.761 [Thread-59] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1446548917315, slaveId=20151103-102825-169978048-5050-873-S1
11:10:38.762 [Thread-59] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1446548986442, slaveId=20151103-102825-169978048-5050-873-S1
11:10:38.780 [Thread-59] INFO o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 0/2
11:10:38.781 [Thread-59] INFO o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
11:10:38.782 [Thread-59] INFO o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
11:10:38.812 [Thread-60] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.namenode.namenode.NameNodeExecutor.1446548986442 state=TASK_RUNNING message=''
11:10:38.813 [Thread-60] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
value: "task.namenode.namenode.NameNodeExecutor.1446548986442"
}
state: TASK_RUNNING
slave_id {
value: "20151103-102825-169978048-5050-873-S1"
}
timestamp: 1.446549038783159E9
source: SOURCE_EXECUTOR
uuid: "\301\345,\272f>O\\\277Y5\027\260$A{"
labels {
labels {
key: "status"
value: "initialized"
}
}
11:10:38.823 [Thread-60] ERROR o.apache.mesos.hdfs.state.HdfsState - Failed to update TaskStatus with ID: task.namenode.namenode.NameNodeExecutor.1446548986442with exception: null
11:10:38.824 [Thread-60] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.namenode.namenode.NameNodeExecutor.1446548986442
11:10:38.838 [Thread-60] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1446548917315, slaveId=20151103-102825-169978048-5050-873-S1
11:10:38.839 [Thread-60] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1446548986442, slaveId=20151103-102825-169978048-5050-873-S1
11:10:38.860 [Thread-60] INFO o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 0/2
11:10:38.860 [Thread-60] INFO o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
11:10:38.860 [Thread-60] INFO o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
11:10:40.518 [Thread-61] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
11:10:40.519 [Thread-61] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "20151103-102825-169978048-5050-873-O2466"
11:10:40.519 [Thread-61] INFO o.a.m.hdfs.scheduler.NodeLauncher - Node: namenode, evaluating offer: value: "20151103-102825-169978048-5050-873-O2466"
11:10:40.543 [Thread-61] INFO o.a.mesos.hdfs.scheduler.NameNode - We need to colocate the namenode with a journalnode and there is no journalnode running on this host. mesos-slave2
the mesos web ui:

the mesos-site.xml is:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mesos.hdfs.data.dir</name>
<description>The primary data directory in HDFS</description>
<value>/var/lib/hdfs/data</value>
</property>
<property>
<name>mesos.hdfs.secondary.data.dir</name>
<description>The secondary data directory in HDFS</description>
<value>/var/run/hadoop-hdfs</value>
</property>
<property>
<name>mesos.hdfs.native-hadoop-binaries</name>
<description>Mark true if you have hadoop pre-installed on your host machines (otherwise it will be distributed by the scheduler)</description>
<value>false</value>
</property>
<property>
<name>mesos.hdfs.framework.mnt.path</name>
<description>Mount location (if mesos.hdfs.native-hadoop-binaries is marked false)</description>
<value>/opt/mesosphere</value>
</property>
<property>
<name>mesos.hdfs.state.zk</name>
<description>Comma-separated hostname-port pairs of zookeeper node locations for HDFS framework state information</description>
<value>mesos-master.vagrant:2181</value>
</property>
<property>
<name>mesos.master.uri</name>
<description>Zookeeper entry for mesos master location</description>
<value>zk://mesos-master.vagrant:2181/mesos</value>
</property>
<property>
<name>mesos.hdfs.zkfc.ha.zookeeper.quorum</name>
<description>Comma-separated list of zookeeper hostname-port pairs for HDFS HA features</description>
<value>mesos-master.vagrant:2181</value>
</property>
<property>
<name>mesos.hdfs.framework.name</name>
<description>Your Mesos framework name and cluster name when accessing files (hdfs://YOUR_NAME)</description>
<value>hdfs</value>
</property>
<property>
<name>mesos.hdfs.mesosdns</name>
<description>Whether to use Mesos DNS for service discovery within HDFS</description>
<value>false</value>
</property>
<property>
<name>mesos.hdfs.mesosdns.domain</name>
<description>Root domain name of Mesos DNS (usually 'mesos')</description>
<value>mesos</value>
</property>
<property>
<name>mesos.native.library</name>
<description>Location of libmesos.so</description>
<value>/usr/local/lib/libmesos.so</value>
</property>
<property>
<name>mesos.hdfs.journalnode.count</name>
<description>Number of journal nodes (must be odd)</description>
<value>1</value>
</property>
<!-- Additional settings for fine-tuning -->
<property>
<name>mesos.hdfs.jvm.overhead</name>
<description>Multiplier on resources reserved in order to account for JVM allocation</description>
<value>1</value>
</property>
<property>
<name>mesos.hdfs.hadoop.heap.size</name>
<value>128</value>
</property>
<property>
<name>mesos.hdfs.namenode.heap.size</name>
<value>128</value>
</property>
<property>
<name>mesos.hdfs.datanode.heap.size</name>
<value>128</value>
</property>
<property>
<name>mesos.hdfs.executor.heap.size</name>
<value>128</value>
</property>
<property>
<name>mesos.hdfs.executor.cpus</name>
<value>0.2</value>
</property>
<property>
<name>mesos.hdfs.namenode.cpus</name>
<value>0.2</value>
</property>
<property>
<name>mesos.hdfs.journalnode.cpus</name>
<value>0.2</value>
</property>
<property>
<name>mesos.hdfs.datanode.cpus</name>
<value>0.2</value>
</property>
<property>
<name>mesos.hdfs.user</name>
<value>root</value>
</property>
<property>
<name>mesos.hdfs.role</name>
<value>*</value>
</property>
<property>
<name>mesos.hdfs.datanode.exclusive</name>
WARNING-It is not advisable to run the datanode on same slave because of performance issues
<description>Whether to run the datanode on slave different from namenode and journal nodes</description>
<value>true</value>
</property>
</configuration>
Someone please help me, I really need to make this run
Hi, what do the logs in your task sandboxes say for the failed tasks? As far as using the vagrant file from Banno/vagrant-mesos, that is an old version that has not been tested in sometime. That was actually created by a partner org (Banno).
@elingg thank you for your reply! I found that namenodes were started and running, but their status was uninitialized. So the scheduler killed them after a while and then tried to restart another. stdout showed:
23:03:07.478 [Thread-1035] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.namenode.namenode.NameNodeExecutor.1446591678871 state=TASK_RUNNING message=''
23:03:07.480 [Thread-1035] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
value: "task.namenode.namenode.NameNodeExecutor.1446591678871"
}
state: TASK_RUNNING
slave_id {
value: "20151103-202237-169978048-5050-1873-S0"
}
timestamp: 1.446591787466207E9
source: SOURCE_EXECUTOR
uuid: "S\351\372\261u\256B\214\261t;\234(8%6"
labels {
labels {
key: "status"
value: "uninitialized"
}
}
23:03:07.490 [Thread-1035] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.namenode.namenode.NameNodeExecutor.1446591678871
23:03:07.500 [Thread-1035] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1446589632519, slaveId=20151103-202237-169978048-5050-1873-S3
23:03:07.501 [Thread-1035] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1446589948240, slaveId=20151103-202237-169978048-5050-1873-S1
23:03:07.501 [Thread-1035] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.namenode.namenode.NameNodeExecutor.1446591678871, slaveId=20151103-202237-169978048-5050-1873-S0
23:03:07.502 [Thread-1035] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1446589947175, slaveId=20151103-202237-169978048-5050-1873-S0
23:03:07.522 [Thread-1035] INFO o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 3/3 and name counts: 2/2
23:03:07.531 [Thread-1035] INFO o.apache.mesos.hdfs.state.HdfsState - 0/2 NameNodes initialized.
23:03:07.531 [Thread-1035] INFO o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
23:03:07.531 [Thread-1035] INFO o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
23:03:07.532 [Thread-1036] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.zkfc.namenode.NameNodeExecutor.1446591678871 state=TASK_RUNNING message=''
23:03:07.533 [Thread-1036] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
value: "task.zkfc.namenode.NameNodeExecutor.1446591678871"
}
state: TASK_RUNNING
slave_id {
value: "20151103-202237-169978048-5050-1873-S0"
}
timestamp: 1.446591787468008E9
source: SOURCE_EXECUTOR
uuid: "\255\266\261\360\024qNc\254\000\335\003\016O]\353"
I can't see the sandbox log, i don't know why the master cannot access the slave. the web ui showed this:
Failed to connect to slave '20151103-202237-169978048-5050-1873-S3' on 'mesos-slave4:5051'.
Potential reasons:
The slave's hostname, 'mesos-slave4', is not accessible from your network
The slave's port, '5051', is not accessible from your network
The slave timed out or went offline
Then I sshed into slave and found the log of mesos-slave:
Launching task task.journalnode.journalnode.NodeExecutor.1446589948240 for framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:32:28.33340 I1103 22:32:28.333302 1664 slave.cpp:4208] Launching executor executor.journalnode.NodeExecutor.1446589948240 of framework 20151103-221029-169978048-5050-901-0000 in work directory '/tmp/slaves/20151103-202237-169978048-5050-1873-S1/frameworks/20151103-221029-169978048-5050-901-0000/executors/executor.journalnode.NodeExecutor.1446589948240/runs/36e3546f-c292-418e-aeed-92568eaf6893'
2015-11-03_22:32:28.33517 I1103 22:32:28.334951 1669 docker.cpp:772] No container info found, skipping launch
2015-11-03_22:32:28.33583 I1103 22:32:28.335490 1665 containerizer.cpp:484] Starting container '36e3546f-c292-418e-aeed-92568eaf6893' for executor 'executor.journalnode.NodeExecutor.1446589948240' of framework '20151103-221029-169978048-5050-901-0000'
2015-11-03_22:32:28.33905 I1103 22:32:28.338449 1663 launcher.cpp:130] Forked child with pid '1799' for container '36e3546f-c292-418e-aeed-92568eaf6893'
2015-11-03_22:32:28.33908 I1103 22:32:28.338742 1663 containerizer.cpp:694] Checkpointing executor's forked pid 1799 to '/tmp/meta/slaves/20151103-202237-169978048-5050-1873-S1/frameworks/20151103-221029-169978048-5050-901-0000/executors/executor.journalnode.NodeExecutor.1446589948240/runs/36e3546f-c292-418e-aeed-92568eaf6893/pids/forked.pid'
2015-11-03_22:32:28.34083 I1103 22:32:28.340762 1664 slave.cpp:1401] Queuing task 'task.journalnode.journalnode.NodeExecutor.1446589948240' for executor executor.journalnode.NodeExecutor.1446589948240 of framework '20151103-221029-169978048-5050-901-0000
2015-11-03_22:32:28.34812 I1103 22:32:28.348052 1663 fetcher.cpp:238] Fetching URIs using command '/usr/libexec/mesos/mesos-fetcher'
2015-11-03_22:32:32.43333 I1103 22:32:32.433059 1665 slave.cpp:3648] Current disk usage 4.46%. Max allowed age: 5.987967803794803days
2015-11-03_22:33:32.43675 I1103 22:33:32.436506 1664 slave.cpp:3648] Current disk usage 4.52%. Max allowed age: 5.983440275860776days
2015-11-03_22:34:22.00633 I1103 22:34:22.006160 1665 slave.cpp:3165] Monitoring executor 'executor.journalnode.NodeExecutor.1446589948240' of framework '20151103-221029-169978048-5050-901-0000' in container '36e3546f-c292-418e-aeed-92568eaf6893'
2015-11-03_22:34:22.76291 I1103 22:34:22.762608 1665 slave.cpp:2164] Got registration for executor 'executor.journalnode.NodeExecutor.1446589948240' of framework 20151103-221029-169978048-5050-901-0000 from executor(1)@192.168.33.12:44603
2015-11-03_22:34:22.76383 I1103 22:34:22.763734 1665 slave.cpp:1555] Sending queued task 'task.journalnode.journalnode.NodeExecutor.1446589948240' to executor 'executor.journalnode.NodeExecutor.1446589948240' of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:34:22.95782 I1103 22:34:22.957151 1668 slave.cpp:2531] Handling status update TASK_RUNNING (UUID: d3e51e7f-4a0a-4b0a-bdc2-7c05f0cde9a3) for task task.journalnode.journalnode.NodeExecutor.1446589948240 of framework 20151103-221029-169978048-5050-901-0000 from executor(1)@192.168.33.12:44603
2015-11-03_22:34:22.95784 I1103 22:34:22.957340 1668 status_update_manager.cpp:317] Received status update TASK_RUNNING (UUID: d3e51e7f-4a0a-4b0a-bdc2-7c05f0cde9a3) for task task.journalnode.journalnode.NodeExecutor.1446589948240 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:34:22.95786 I1103 22:34:22.957597 1668 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_RUNNING (UUID: d3e51e7f-4a0a-4b0a-bdc2-7c05f0cde9a3) for task task.journalnode.journalnode.NodeExecutor.1446589948240 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:34:22.96231 I1103 22:34:22.961951 1668 slave.cpp:2776] Forwarding the update TASK_RUNNING (UUID: d3e51e7f-4a0a-4b0a-bdc2-7c05f0cde9a3) for task task.journalnode.journalnode.NodeExecutor.1446589948240 of framework 20151103-221029-169978048-5050-901-0000 to [email protected]:5050
2015-11-03_22:34:22.96233 I1103 22:34:22.962057 1668 slave.cpp:2709] Sending acknowledgement for status update TASK_RUNNING (UUID: d3e51e7f-4a0a-4b0a-bdc2-7c05f0cde9a3) for task task.journalnode.journalnode.NodeExecutor.1446589948240 of framework 20151103-221029-169978048-5050-901-0000 to executor(1)@192.168.33.12:44603
2015-11-03_22:34:23.03139 I1103 22:34:23.031255 1667 status_update_manager.cpp:389] Received status update acknowledgement (UUID: d3e51e7f-4a0a-4b0a-bdc2-7c05f0cde9a3) for task task.journalnode.journalnode.NodeExecutor.1446589948240 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:34:23.03144 I1103 22:34:23.031334 1667 status_update_manager.hpp:346] Checkpointing ACK for status update TASK_RUNNING (UUID: d3e51e7f-4a0a-4b0a-bdc2-7c05f0cde9a3) for task task.journalnode.journalnode.NodeExecutor.1446589948240 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:34:32.43837 I1103 22:34:32.438122 1662 slave.cpp:3648] Current disk usage 4.92%. Max allowed age: 5.955365198736122days
2015-11-03_22:35:32.43954 I1103 22:35:32.439317 1666 slave.cpp:3648] Current disk usage 4.92%. Max allowed age: 5.955878043914722days
2015-11-03_22:36:32.45008 I1103 22:36:32.449903 1665 slave.cpp:3648] Current disk usage 4.92%. Max allowed age: 5.955854603622627days
2015-11-03_22:36:44.94282 2015-11-03 22:36:44,942:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 111ms
2015-11-03_22:36:48.33689 2015-11-03 22:36:48,336:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 59ms
2015-11-03_22:37:32.45269 I1103 22:37:32.452337 1668 slave.cpp:3648] Current disk usage 4.92%. Max allowed age: 5.955826901459236days
2015-11-03_22:37:35.19007 2015-11-03 22:37:35,189:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 137ms
2015-11-03_22:38:32.45511 I1103 22:38:32.454891 1664 slave.cpp:3648] Current disk usage 4.92%. Max allowed age: 5.955798488983970days
2015-11-03_22:38:45.32138 2015-11-03 22:38:45,321:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 51ms
2015-11-03_22:39:05.41440 2015-11-03 22:39:05,414:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 70ms
2015-11-03_22:39:32.45629 I1103 22:39:32.456070 1668 slave.cpp:3648] Current disk usage 4.92%. Max allowed age: 5.955775048691875days
2015-11-03_22:40:32.45675 I1103 22:40:32.456569 1664 slave.cpp:3648] Current disk usage 4.92%. Max allowed age: 5.955747346528484days
2015-11-03_22:40:38.88019 2015-11-03 22:40:38,879:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 11ms
2015-11-03_22:40:49.01236 2015-11-03 22:40:49,011:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 120ms
2015-11-03_22:41:32.45942 I1103 22:41:32.459332 1665 slave.cpp:3648] Current disk usage 4.92%. Max allowed age: 5.955724616548276days
2015-11-03_22:42:09.15752 2015-11-03 22:42:09,157:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 43ms
2015-11-03_22:42:16.49725 I1103 22:42:16.497176 1664 slave.cpp:1144] Got assigned task task.namenode.namenode.NameNodeExecutor.1446590536443 for framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:42:16.49761 I1103 22:42:16.497565 1664 slave.cpp:1144] Got assigned task task.zkfc.namenode.NameNodeExecutor.1446590536443 for framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:42:16.49778 I1103 22:42:16.497750 1664 slave.cpp:1254] Launching task task.namenode.namenode.NameNodeExecutor.1446590536443 for framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:42:16.50457 I1103 22:42:16.504530 1664 slave.cpp:4208] Launching executor executor.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 in work directory '/tmp/slaves/20151103-202237-169978048-5050-1873-S1/frameworks/20151103-221029-169978048-5050-901-0000/executors/executor.namenode.NameNodeExecutor.1446590536443/runs/52c51d3b-854a-47f6-936e-bde752d080d8'
2015-11-03_22:42:16.50615 I1103 22:42:16.505013 1666 docker.cpp:772] No container info found, skipping launch
2015-11-03_22:42:16.50622 I1103 22:42:16.505082 1664 slave.cpp:1401] Queuing task 'task.namenode.namenode.NameNodeExecutor.1446590536443' for executor executor.namenode.NameNodeExecutor.1446590536443 of framework '20151103-221029-169978048-5050-901-0000
2015-11-03_22:42:16.50626 I1103 22:42:16.505121 1664 slave.cpp:1254] Launching task task.zkfc.namenode.NameNodeExecutor.1446590536443 for framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:42:16.50629 I1103 22:42:16.505133 1666 containerizer.cpp:484] Starting container '52c51d3b-854a-47f6-936e-bde752d080d8' for executor 'executor.namenode.NameNodeExecutor.1446590536443' of framework '20151103-221029-169978048-5050-901-0000'
2015-11-03_22:42:16.50643 I1103 22:42:16.505437 1664 slave.cpp:1401] Queuing task 'task.zkfc.namenode.NameNodeExecutor.1446590536443' for executor executor.namenode.NameNodeExecutor.1446590536443 of framework '20151103-221029-169978048-5050-901-0000
2015-11-03_22:42:16.50665 I1103 22:42:16.506427 1666 launcher.cpp:130] Forked child with pid '3568' for container '52c51d3b-854a-47f6-936e-bde752d080d8'
2015-11-03_22:42:16.50669 I1103 22:42:16.506613 1666 containerizer.cpp:694] Checkpointing executor's forked pid 3568 to '/tmp/meta/slaves/20151103-202237-169978048-5050-1873-S1/frameworks/20151103-221029-169978048-5050-901-0000/executors/executor.namenode.NameNodeExecutor.1446590536443/runs/52c51d3b-854a-47f6-936e-bde752d080d8/pids/forked.pid'
2015-11-03_22:42:16.51901 I1103 22:42:16.517597 1667 fetcher.cpp:238] Fetching URIs using command '/usr/libexec/mesos/mesos-fetcher'
2015-11-03_22:42:32.46823 I1103 22:42:32.468086 1662 slave.cpp:3648] Current disk usage 5.30%. Max allowed age: 5.929105678781134days
2015-11-03_22:43:05.90824 2015-11-03 22:43:05,907:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 14ms
2015-11-03_22:43:32.47098 I1103 22:43:32.470630 1663 slave.cpp:3648] Current disk usage 5.37%. Max allowed age: 5.923800359336632days
2015-11-03_22:43:57.54303 I1103 22:43:57.542760 1667 slave.cpp:3165] Monitoring executor 'executor.namenode.NameNodeExecutor.1446590536443' of framework '20151103-221029-169978048-5050-901-0000' in container '52c51d3b-854a-47f6-936e-bde752d080d8'
2015-11-03_22:43:58.68386 I1103 22:43:58.683781 1669 slave.cpp:2164] Got registration for executor 'executor.namenode.NameNodeExecutor.1446590536443' of framework 20151103-221029-169978048-5050-901-0000 from executor(1)@192.168.33.12:41694
2015-11-03_22:43:58.68520 I1103 22:43:58.685139 1669 slave.cpp:1555] Sending queued task 'task.namenode.namenode.NameNodeExecutor.1446590536443' to executor 'executor.namenode.NameNodeExecutor.1446590536443' of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:58.68559 I1103 22:43:58.685538 1669 slave.cpp:1555] Sending queued task 'task.zkfc.namenode.NameNodeExecutor.1446590536443' to executor 'executor.namenode.NameNodeExecutor.1446590536443' of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:58.93193 I1103 22:43:58.931197 1666 slave.cpp:2531] Handling status update TASK_RUNNING (UUID: d1681770-3994-432c-a0bf-68d26b52f50f) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 from executor(1)@192.168.33.12:41694
2015-11-03_22:43:58.93198 I1103 22:43:58.931314 1666 status_update_manager.cpp:317] Received status update TASK_RUNNING (UUID: d1681770-3994-432c-a0bf-68d26b52f50f) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:58.93199 I1103 22:43:58.931509 1666 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_RUNNING (UUID: d1681770-3994-432c-a0bf-68d26b52f50f) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:58.93509 I1103 22:43:58.934931 1663 slave.cpp:2531] Handling status update TASK_RUNNING (UUID: b8f29b28-5482-4c0b-941a-95c59627999b) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 from executor(1)@192.168.33.12:41694
2015-11-03_22:43:58.93691 I1103 22:43:58.936573 1666 status_update_manager.cpp:317] Received status update TASK_RUNNING (UUID: b8f29b28-5482-4c0b-941a-95c59627999b) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:58.93693 I1103 22:43:58.936736 1666 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_RUNNING (UUID: b8f29b28-5482-4c0b-941a-95c59627999b) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:58.93741 I1103 22:43:58.937026 1663 slave.cpp:2776] Forwarding the update TASK_RUNNING (UUID: d1681770-3994-432c-a0bf-68d26b52f50f) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 to [email protected]:5050
2015-11-03_22:43:58.93742 I1103 22:43:58.937126 1663 slave.cpp:2709] Sending acknowledgement for status update TASK_RUNNING (UUID: d1681770-3994-432c-a0bf-68d26b52f50f) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 to executor(1)@192.168.33.12:41694
2015-11-03_22:43:58.94042 I1103 22:43:58.940106 1666 slave.cpp:2776] Forwarding the update TASK_RUNNING (UUID: b8f29b28-5482-4c0b-941a-95c59627999b) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 to [email protected]:5050
2015-11-03_22:43:58.94044 I1103 22:43:58.940207 1666 slave.cpp:2709] Sending acknowledgement for status update TASK_RUNNING (UUID: b8f29b28-5482-4c0b-941a-95c59627999b) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 to executor(1)@192.168.33.12:41694
2015-11-03_22:43:59.00595 I1103 22:43:59.005162 1663 status_update_manager.cpp:389] Received status update acknowledgement (UUID: d1681770-3994-432c-a0bf-68d26b52f50f) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:59.00597 I1103 22:43:59.005221 1663 status_update_manager.hpp:346] Checkpointing ACK for status update TASK_RUNNING (UUID: d1681770-3994-432c-a0bf-68d26b52f50f) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:59.06186 I1103 22:43:59.061506 1663 status_update_manager.cpp:389] Received status update acknowledgement (UUID: b8f29b28-5482-4c0b-941a-95c59627999b) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:43:59.06187 I1103 22:43:59.061559 1663 status_update_manager.hpp:346] Checkpointing ACK for status update TASK_RUNNING (UUID: b8f29b28-5482-4c0b-941a-95c59627999b) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:44:06.00495 2015-11-03 22:44:06,004:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 22ms
2015-11-03_22:44:22.70103 2015-11-03 22:44:22,700:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 14ms
2015-11-03_22:44:32.47205 I1103 22:44:32.471585 1663 slave.cpp:3648] Current disk usage 5.75%. Max allowed age: 5.897817150703623days
2015-11-03_22:44:46.10124 2015-11-03 22:44:46,101:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 45ms
2015-11-03_22:45:32.47997 I1103 22:45:32.479790 1662 slave.cpp:3648] Current disk usage 5.75%. Max allowed age: 5.897784476357060days
2015-11-03_22:45:39.51854 2015-11-03 22:45:39,518:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 28ms
2015-11-03_22:46:13.02506 2015-11-03 22:46:13,024:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 140ms
2015-11-03_22:46:32.51982 I1103 22:46:32.519547 1668 slave.cpp:3648] Current disk usage 5.75%. Max allowed age: 5.897743988579803days
2015-11-03_22:46:46.44861 2015-11-03 22:46:46,448:1657(0x7f5ecdfe0700):ZOO_WARN@zookeeper_interest@1557: Exceeded deadline by 60ms
2015-11-03_22:47:32.52039 I1103 22:47:32.520289 1669 slave.cpp:3648] Current disk usage 5.75%. Max allowed age: 5.897699949243137days
2015-11-03_22:48:26.27921 I1103 22:48:26.278497 1663 slave.cpp:2531] Handling status update TASK_KILLED (UUID: 4ec06aaf-2c53-4593-8ecb-0d041c2f0437) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 from executor(1)@192.168.33.12:41694
2015-11-03_22:48:26.27924 I1103 22:48:26.278674 1663 slave.cpp:2531] Handling status update TASK_KILLED (UUID: 503988d5-0d01-456e-bf21-9779ced4659a) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 from executor(1)@192.168.33.12:41694
2015-11-03_22:48:26.27925 I1103 22:48:26.278935 1663 status_update_manager.cpp:317] Received status update TASK_KILLED (UUID: 4ec06aaf-2c53-4593-8ecb-0d041c2f0437) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.27926 I1103 22:48:26.278962 1663 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_KILLED (UUID: 4ec06aaf-2c53-4593-8ecb-0d041c2f0437) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.28469 I1103 22:48:26.283834 1663 status_update_manager.cpp:317] Received status update TASK_KILLED (UUID: 503988d5-0d01-456e-bf21-9779ced4659a) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.28472 I1103 22:48:26.283860 1663 status_update_manager.hpp:346] Checkpointing UPDATE for status update TASK_KILLED (UUID: 503988d5-0d01-456e-bf21-9779ced4659a) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.28473 I1103 22:48:26.284127 1667 slave.cpp:2776] Forwarding the update TASK_KILLED (UUID: 4ec06aaf-2c53-4593-8ecb-0d041c2f0437) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 to [email protected]:5050
2015-11-03_22:48:26.28473 I1103 22:48:26.284198 1667 slave.cpp:2709] Sending acknowledgement for status update TASK_KILLED (UUID: 4ec06aaf-2c53-4593-8ecb-0d041c2f0437) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 to executor(1)@192.168.33.12:41694
2015-11-03_22:48:26.28693 I1103 22:48:26.286715 1663 slave.cpp:2776] Forwarding the update TASK_KILLED (UUID: 503988d5-0d01-456e-bf21-9779ced4659a) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 to [email protected]:5050
2015-11-03_22:48:26.28694 I1103 22:48:26.286774 1663 slave.cpp:2709] Sending acknowledgement for status update TASK_KILLED (UUID: 503988d5-0d01-456e-bf21-9779ced4659a) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000 to executor(1)@192.168.33.12:41694
2015-11-03_22:48:26.31790 I1103 22:48:26.317584 1663 status_update_manager.cpp:389] Received status update acknowledgement (UUID: 4ec06aaf-2c53-4593-8ecb-0d041c2f0437) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.31796 I1103 22:48:26.317630 1663 status_update_manager.hpp:346] Checkpointing ACK for status update TASK_KILLED (UUID: 4ec06aaf-2c53-4593-8ecb-0d041c2f0437) for task task.namenode.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.34322 I1103 22:48:26.342957 1663 status_update_manager.cpp:389] Received status update acknowledgement (UUID: 503988d5-0d01-456e-bf21-9779ced4659a) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.34328 I1103 22:48:26.342993 1663 status_update_manager.hpp:346] Checkpointing ACK for status update TASK_KILLED (UUID: 503988d5-0d01-456e-bf21-9779ced4659a) for task task.zkfc.namenode.NameNodeExecutor.1446590536443 of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.65705 I1103 22:48:26.656960 1667 containerizer.cpp:1123] Executor for container '52c51d3b-854a-47f6-936e-bde752d080d8' has exited
2015-11-03_22:48:26.65721 I1103 22:48:26.657150 1667 containerizer.cpp:918] Destroying container '52c51d3b-854a-47f6-936e-bde752d080d8'
2015-11-03_22:48:26.71774 I1103 22:48:26.717645 1667 slave.cpp:3223] Executor 'executor.namenode.NameNodeExecutor.1446590536443' of framework 20151103-221029-169978048-5050-901-0000 exited with status 3
2015-11-03_22:48:26.71815 I1103 22:48:26.718092 1667 slave.cpp:3332] Cleaning up executor 'executor.namenode.NameNodeExecutor.1446590536443' of framework 20151103-221029-169978048-5050-901-0000
2015-11-03_22:48:26.71883 I1103 22:48:26.718670 1665 gc.cpp:56] Scheduling '/tmp/slaves/20151103-202237-169978048-5050-1873-S1/frameworks/20151103-221029-169978048-5050-901-0000/executors/executor.namenode.NameNodeExecutor.1446590536443/runs/52c51d3b-854a-47f6-936e-bde752d080d8' for gc 6.99999168454815days in the future
2015-11-03_22:48:26.71885 I1103 22:48:26.718761 1665 gc.cpp:56] Scheduling '/tmp/slaves/20151103-202237-169978048-5050-1873-S1/frameworks/20151103-221029-169978048-5050-901-0000/executors/executor.namenode.NameNodeExecutor.1446590536443' for gc 6.99999168194667days in the future
2015-11-03_22:48:26.71887 I1103 22:48:26.718811 1665 gc.cpp:56] Scheduling '/tmp/meta/slaves/20151103-202237-169978048-5050-1873-S1/frameworks/20151103-221029-169978048-5050-901-0000/executors/executor.namenode.NameNodeExecutor.1446590536443/runs/52c51d3b-854a-47f6-936e-bde752d080d8' for gc 6.99999168142519days in the future
2015-11-03_22:48:26.71894 I1103 22:48:26.718849 1665 gc.cpp:56] Scheduling '/tmp/meta/slaves/20151103-202237-169978048-5050-1873-S1/frameworks/20151103-221029-169978048-5050-901-0000/executors/executor.namenode.NameNodeExecutor.1446590536443' for gc 6.99999168104days in the future
I think you are right that the Banno/vagrant-mesos may has some bugs. The mesos-slave runs task in docker container. I checked history issues, and it might not work. If you could share your vagrantfiles, I would be very grateful. I have tried another one listed on mesosphere which used chef, since I'm not familiar with chef, I prefer one which only has shell provisioners.
I see this [docker.cpp:772] No container info found, skipping launch. Seems to be a bug in the vagrant file or somewhere in the set up because tasks are not finding the containers. One possibility is that the --executor_registration_timeout slave flag is not set to be high enough.
As far as different vagrant files, there are a number of them that we use. ContainerSolutions minimesos is one, https://github.com/mesosphere/playa-mesos is another, and there is another for DCOS, https://downloads.mesosphere.com/dcos/alpha/make_dcos_vagrant.sh. Some of these need to be modified to have enough slaves to run HDFS. To test, we also use DCOS on AWS or create our own Mesos cluster in the cloud or on prem.
@elingg hi, I tried another vagrantfile, but i still stuck in the namenode state. here is the new errors from namenode executor. the hosts hotname fies are all right. Mesos is 0.22.1, jdk is openjdk-7-jdk.
FO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/11/05 06:17:04 INFO namenode.NameNode: createNameNode [-bootstrapStandby, -force]
15/11/05 06:17:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/11/05 06:17:05 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:05 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:05 INFO ha.HealthMonitor: Entering state SERVICE_NOT_RESPONDING
15/11/05 06:17:05 INFO ha.ZKFailoverController: Local service NameNode at mesos-slave1/192.168.33.11:50071 entered state: SERVICE_NOT_RESPONDING
15/11/05 06:17:05 INFO ha.ZKFailoverController: Quitting master election for NameNode at mesos-slave1/192.168.33.11:50071 and marking that fencing is necessary
15/11/05 06:17:05 INFO ha.ActiveStandbyElector: Yielding from election
15/11/05 06:17:05 INFO zookeeper.ClientCnxn: EventThread shut down
15/11/05 06:17:05 INFO zookeeper.ZooKeeper: Session: 0x150d63e8292001f closed
15/11/05 06:17:07 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:07 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:09 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:09 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:11 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:11 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:13 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:13 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:13 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:14 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:15 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:15 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:15 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:16 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:17 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:17 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:17 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:18 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:19 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:19 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:19 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:20 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:21 INFO ipc.Client: Retrying connect to server: mesos-slave1/192.168.33.11:50071. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:21 WARN ha.HealthMonitor: Transport-level exception trying to monitor health of NameNode at mesos-slave1/192.168.33.11:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave1:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:21 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:22 INFO ipc.Client: Retrying connect to server: mesos-slave2/192.168.33.12:50071. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
15/11/05 06:17:22 FATAL ha.BootstrapStandby: Unable to fetch namespace information from active NN at mesos-slave2/192.168.33.12:50071: Call From mesos-slave1/192.168.33.11 to mesos-slave2:50071 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
15/11/05 06:17:22 INFO util.ExitUtil: Exiting with status 2
15/11/05 06:17:22 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at mesos-slave1/192.168.33.11
************************************************************/
anyone here, please help me! I've been working on this for 4 days and still not working.
Looks like a networking error. Another thought is make sure you are pointing to the correct ip's for your mesos.hdfs.zkfc.ha.zookeeper.quorum. If that is not correct, you may not be able to connect to your zkfc.
I do have a very easy option for you though. Install DCOS and you can install HDFS in a single command dcos package install hdfs Here is the DCOS link. https://mesosphere.com/product/
try with mesos-dns?
Hello, We have similar problem to the one described above. We use 3 AWS EC2 instances for HDFS. Vagrant is not used. OS: Centos7 java version "1.8.0_77" Memory 16G Our mesos-site.xml:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mesos.hdfs.constraints</name>
<value>role:hdfs</value>
</property>
<property>
<name>mesos.hdfs.data.dir</name>
<description>The primary data directory in HDFS</description>
<value>/workspace/hdfs/data/data</value>
</property>
<!-- Uncomment this to enable secondary data dir
<property>
<name>mesos.hdfs.secondary.data.dir</name>
<description>The secondary data directory in HDFS</description>
<value>/workspace/hdfs/data2</value>
</property>
-->
<property>
<name>mesos.hdfs.domain.socket.dir</name>
<description>The location used for a local socket used by the data nodes</description>
<value>/var/run/hadoop-hdfs</value>
</property>
<!-- Uncomment this to enable backup
<property>
<name>mesos.hdfs.backup.dir</name>
<description>Backup dir for HDFS</description>
<value>/tmp/nfs</value>
</property>
-->
<property>
<name>mesos.hdfs.native-hadoop-binaries</name>
<description>Mark true if you have hadoop pre-installed on your host machines (otherwise it will be distributed by the scheduler)</description>
<value>false</value>
</property>
<property>
<name>mesos.hdfs.framework.mnt.path</name>
<description>Mount location (if mesos.hdfs.native-hadoop-binaries is marked false)</description>
<value>/opt/mesosphere</value>
</property>
<property>
<name>mesos.hdfs.state.zk</name>
<description>Comma-separated hostname-port pairs of zookeeper node locations for HDFS framework state information</description>
<value>mesosmaster01.rosen.local:2181,mesosmaster02.rosen.local:2181,mesosmaster03.rosen.local:2181</value>
</property>
<property>
<name>mesos.master.uri</name>
<description>Zookeeper entry for mesos master location</description>
<value>zk://mesosmaster01.rosen.local:2181,mesosmaster02.rosen.local:2181,mesosmaster03.rosen.local:2181/mesos</value>
</property>
<property>
<name>mesos.hdfs.zkfc.ha.zookeeper.quorum</name>
<description>Comma-separated list of zookeeper hostname-port pairs for HDFS HA features</description>
<value>mesosmaster01.rosen.local:2181,mesosmaster02.rosen.local:2181,mesosmaster03.rosen.local:2181</value>
</property>
<property>
<name>mesos.hdfs.framework.name</name>
<description>Your Mesos framework name and cluster name when accessing files (hdfs://YOUR_NAME)</description>
<value>hdfs</value>
</property>
<property>
<name>mesos.hdfs.mesosdns</name>
<description>Whether to use Mesos DNS for service discovery within HDFS</description>
<value>true</value>
</property>
<property>
<name>mesos.hdfs.mesosdns.domain</name>
<description>Root domain name of Mesos DNS (usually 'mesos')</description>
<value>rosen.local</value>
</property>
<property>
<name>mesos.native.library</name>
<description>Location of libmesos.so</description>
<value>/usr/local/lib/libmesos.so</value>
</property>
<property>
<name>mesos.hdfs.journalnode.count</name>
<description>Number of journal nodes (must be odd)</description>
<value>1</value>
</property>
<property>
<name>mesos.hdfs.executor.cpus</name>
<value>0.2</value>
</property>
<property>
<name>mesos.hdfs.namenode.cpus</name>
<value>0.2</value>
</property>
<property>
<name>mesos.hdfs.journalnode.cpus</name>
<value>0.2</value>
</property>
<property>
<name>mesos.hdfs.datanode.cpus</name>
<value>0.2</value>
</property>
<property>
<name>mesos.hdfs.namenode.heap.size</name>
<value>2048</value>
</property>
<!-- Additional settings for fine-tuning -->
<!--
<property>
<name>mesos.hdfs.jvm.overhead</name>
<description>Multiplier on resources reserved in order to account for JVM allocation</description>
<value>1.35</value>
</property>
<property>
<name>mesos.hdfs.hadoop.heap.size</name>
<value>512</value>
</property>
<property>
<name>mesos.hdfs.namenode.heap.size</name>
<value>4096</value>
</property>
<property>
<name>mesos.hdfs.datanode.heap.size</name>
<value>1024</value>
</property>
<property>
<name>mesos.hdfs.executor.heap.size</name>
<value>256</value>
</property>
<property>
<name>mesos.hdfs.executor.cpus</name>
<value>0.5</value>
</property>
<property>
<name>mesos.hdfs.namenode.cpus</name>
<value>1</value>
</property>
<property>
<name>mesos.hdfs.journalnode.cpus</name>
<value>1</value>
</property>
<property>
<name>mesos.hdfs.datanode.cpus</name>
<value>1</value>
</property>
<property>
<name>mesos.hdfs.user</name>
<value>root</value>
</property>
<property>
<name>mesos.hdfs.role</name>
<value>*</value>
</property>
<property>
<name>mesos.hdfs.datanode.exclusive</name>
WARNING-It is not advisable to run the datanode on same slave because of performance issues
<description>Whether to run the datanode on slave different from namenode and journal nodes</description>
<value>true</value>
</property>
-->
</configuration>
mesos-slave log
I0412 10:03:47.383617 1128 slave.cpp:2248] Updating framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 pid to [email protected]:59562
I0412 10:03:47.383980 1128 status_update_manager.cpp:181] Resuming sending status updates
I0412 10:03:52.571465 1129 slave.cpp:1361] Got assigned task task.namenode.namenode.NameNodeExecutor.1460455432500 for framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:52.572115 1129 slave.cpp:1361] Got assigned task task.zkfc.namenode.NameNodeExecutor.1460455432500 for framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:52.572258 1129 slave.cpp:1480] Launching task task.namenode.namenode.NameNodeExecutor.1460455432500 for framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:52.577028 1129 paths.cpp:528] Trying to chown '/workspace/slaves/835221b1-c248-4c7a-b641-6b139b37a855-S37/frameworks/835221b1-c248-4c7a-b641-6b139b37a855-0002/executors/executor.namenode.NameNodeExecutor.1460455432500/runs/b736fd9e-713b-4ce3-8f6f-72489cce55c4' to user 'root'
I0412 10:03:52.587659 1129 slave.cpp:5367] Launching executor executor.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 with resources cpus(*):0.2; mem(*):345.6 in work directory '/workspace/slaves/835221b1-c248-4c7a-b641-6b139b37a855-S37/frameworks/835221b1-c248-4c7a-b641-6b139b37a855-0002/executors/executor.namenode.NameNodeExecutor.1460455432500/runs/b736fd9e-713b-4ce3-8f6f-72489cce55c4'
I0412 10:03:52.589581 1129 slave.cpp:1698] Queuing task 'task.namenode.namenode.NameNodeExecutor.1460455432500' for executor 'executor.namenode.NameNodeExecutor.1460455432500' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:52.589629 1129 slave.cpp:1480] Launching task task.zkfc.namenode.NameNodeExecutor.1460455432500 for framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:52.589805 1129 slave.cpp:1698] Queuing task 'task.zkfc.namenode.NameNodeExecutor.1460455432500' for executor 'executor.namenode.NameNodeExecutor.1460455432500' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:52.590296 1128 docker.cpp:1009] No container info found, skipping launch
I0412 10:03:52.590457 1133 containerizer.cpp:666] Starting container 'b736fd9e-713b-4ce3-8f6f-72489cce55c4' for executor 'executor.namenode.NameNodeExecutor.1460455432500' of framework '835221b1-c248-4c7a-b641-6b139b37a855-0002'
I0412 10:03:52.592732 1132 linux_launcher.cpp:304] Cloning child process with flags =
I0412 10:03:52.593842 1132 systemd.cpp:95] Assigned child process '20366' to 'mesos_executors.slice'
I0412 10:03:52.595650 1132 containerizer.cpp:1118] Checkpointing executor's forked pid 20366 to '/workspace/meta/slaves/835221b1-c248-4c7a-b641-6b139b37a855-S37/frameworks/835221b1-c248-4c7a-b641-6b139b37a855-0002/executors/executor.namenode.NameNodeExecutor.1460455432500/runs/b736fd9e-713b-4ce3-8f6f-72489cce55c4/pids/forked.pid'
I0412 10:03:56.420516 1130 slave.cpp:2643] Got registration for executor 'executor.namenode.NameNodeExecutor.1460455432500' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:51595
I0412 10:03:56.421440 1130 slave.cpp:1863] Sending queued task 'task.namenode.namenode.NameNodeExecutor.1460455432500' to executor 'executor.namenode.NameNodeExecutor.1460455432500' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 at executor(1)@172.32.2.201:51595
I0412 10:03:56.421576 1130 slave.cpp:1863] Sending queued task 'task.zkfc.namenode.NameNodeExecutor.1460455432500' to executor 'executor.namenode.NameNodeExecutor.1460455432500' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 at executor(1)@172.32.2.201:51595
I0412 10:03:56.629096 1129 slave.cpp:3002] Handling status update TASK_RUNNING (UUID: f3723e38-75ab-40d1-84ee-5e5e87066460) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:51595
I0412 10:03:56.629263 1129 slave.cpp:3002] Handling status update TASK_RUNNING (UUID: 2e23a614-1365-434d-8149-3bbe82689458) for task task.zkfc.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:51595
I0412 10:03:56.629564 1126 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: f3723e38-75ab-40d1-84ee-5e5e87066460) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:56.629753 1126 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_RUNNING (UUID: f3723e38-75ab-40d1-84ee-5e5e87066460) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:56.637223 1128 slave.cpp:3400] Forwarding the update TASK_RUNNING (UUID: f3723e38-75ab-40d1-84ee-5e5e87066460) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to [email protected]:5050
I0412 10:03:56.637322 1128 slave.cpp:3310] Sending acknowledgement for status update TASK_RUNNING (UUID: f3723e38-75ab-40d1-84ee-5e5e87066460) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to executor(1)@172.32.2.201:51595
I0412 10:03:56.637621 1126 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 2e23a614-1365-434d-8149-3bbe82689458) for task task.zkfc.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:56.637817 1126 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_RUNNING (UUID: 2e23a614-1365-434d-8149-3bbe82689458) for task task.zkfc.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:56.641366 1130 slave.cpp:3400] Forwarding the update TASK_RUNNING (UUID: 2e23a614-1365-434d-8149-3bbe82689458) for task task.zkfc.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to [email protected]:5050
I0412 10:03:56.641475 1130 slave.cpp:3310] Sending acknowledgement for status update TASK_RUNNING (UUID: 2e23a614-1365-434d-8149-3bbe82689458) for task task.zkfc.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to executor(1)@172.32.2.201:51595
I0412 10:03:56.667753 1133 status_update_manager.cpp:392] Received status update acknowledgement (UUID: f3723e38-75ab-40d1-84ee-5e5e87066460) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:56.667856 1133 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING (UUID: f3723e38-75ab-40d1-84ee-5e5e87066460) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:56.695082 1133 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 2e23a614-1365-434d-8149-3bbe82689458) for task task.zkfc.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:56.695195 1133 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING (UUID: 2e23a614-1365-434d-8149-3bbe82689458) for task task.zkfc.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:57.437192 1133 slave.cpp:4374] Current disk usage 13.61%. Max allowed age: 5.347110459506319days
I0412 10:03:58.121980 1126 slave.cpp:3002] Handling status update TASK_FAILED (UUID: 05daa2cf-ce01-41a7-b6f2-12d218f67330) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:51595
I0412 10:03:58.122598 1130 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 05daa2cf-ce01-41a7-b6f2-12d218f67330) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.122643 1130 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 05daa2cf-ce01-41a7-b6f2-12d218f67330) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.131938 1126 slave.cpp:3400] Forwarding the update TASK_FAILED (UUID: 05daa2cf-ce01-41a7-b6f2-12d218f67330) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to [email protected]:5050
I0412 10:03:58.132037 1126 slave.cpp:3310] Sending acknowledgement for status update TASK_FAILED (UUID: 05daa2cf-ce01-41a7-b6f2-12d218f67330) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to executor(1)@172.32.2.201:51595
I0412 10:03:58.154698 1133 slave.cpp:3002] Handling status update TASK_RUNNING (UUID: 12c08871-27cd-4b90-9b51-7d1ef9d99801) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:51595
I0412 10:03:58.155057 1127 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 12c08871-27cd-4b90-9b51-7d1ef9d99801) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.155128 1127 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_RUNNING (UUID: 12c08871-27cd-4b90-9b51-7d1ef9d99801) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.159402 1127 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 05daa2cf-ce01-41a7-b6f2-12d218f67330) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.159435 1133 slave.cpp:3310] Sending acknowledgement for status update TASK_RUNNING (UUID: 12c08871-27cd-4b90-9b51-7d1ef9d99801) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to executor(1)@172.32.2.201:51595
I0412 10:03:58.159457 1127 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 05daa2cf-ce01-41a7-b6f2-12d218f67330) for task task.namenode.namenode.NameNodeExecutor.1460455432500 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.560004 1130 slave.cpp:1361] Got assigned task task.namenode.namenode.NameNodeExecutor.1460455438505 for framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.560318 1130 slave.cpp:1361] Got assigned task task.zkfc.namenode.NameNodeExecutor.1460455438505 for framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.560469 1130 slave.cpp:1480] Launching task task.namenode.namenode.NameNodeExecutor.1460455438505 for framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.560714 1130 paths.cpp:528] Trying to chown '/workspace/slaves/835221b1-c248-4c7a-b641-6b139b37a855-S37/frameworks/835221b1-c248-4c7a-b641-6b139b37a855-0002/executors/executor.namenode.NameNodeExecutor.1460455438505/runs/0fcf4e38-32b4-40cb-a8b7-e810ea95c7fa' to user 'root'
I0412 10:03:58.564749 1130 slave.cpp:5367] Launching executor executor.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 with resources cpus(*):0.2; mem(*):345.6 in work directory '/workspace/slaves/835221b1-c248-4c7a-b641-6b139b37a855-S37/frameworks/835221b1-c248-4c7a-b641-6b139b37a855-0002/executors/executor.namenode.NameNodeExecutor.1460455438505/runs/0fcf4e38-32b4-40cb-a8b7-e810ea95c7fa'
I0412 10:03:58.565224 1130 slave.cpp:1698] Queuing task 'task.namenode.namenode.NameNodeExecutor.1460455438505' for executor 'executor.namenode.NameNodeExecutor.1460455438505' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.565279 1130 slave.cpp:1480] Launching task task.zkfc.namenode.NameNodeExecutor.1460455438505 for framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.565349 1127 docker.cpp:1009] No container info found, skipping launch
I0412 10:03:58.565481 1127 containerizer.cpp:666] Starting container '0fcf4e38-32b4-40cb-a8b7-e810ea95c7fa' for executor 'executor.namenode.NameNodeExecutor.1460455438505' of framework '835221b1-c248-4c7a-b641-6b139b37a855-0002'
I0412 10:03:58.565546 1130 slave.cpp:1698] Queuing task 'task.zkfc.namenode.NameNodeExecutor.1460455438505' for executor 'executor.namenode.NameNodeExecutor.1460455438505' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:03:58.567698 1127 linux_launcher.cpp:304] Cloning child process with flags =
I0412 10:03:58.569170 1127 systemd.cpp:95] Assigned child process '20614' to 'mesos_executors.slice'
I0412 10:03:58.570055 1127 containerizer.cpp:1118] Checkpointing executor's forked pid 20614 to '/workspace/meta/slaves/835221b1-c248-4c7a-b641-6b139b37a855-S37/frameworks/835221b1-c248-4c7a-b641-6b139b37a855-0002/executors/executor.namenode.NameNodeExecutor.1460455438505/runs/0fcf4e38-32b4-40cb-a8b7-e810ea95c7fa/pids/forked.pid'
I0412 10:04:02.514757 1131 slave.cpp:2643] Got registration for executor 'executor.namenode.NameNodeExecutor.1460455438505' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:40841
I0412 10:04:02.515738 1131 slave.cpp:1863] Sending queued task 'task.namenode.namenode.NameNodeExecutor.1460455438505' to executor 'executor.namenode.NameNodeExecutor.1460455438505' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 at executor(1)@172.32.2.201:40841
I0412 10:04:02.515851 1131 slave.cpp:1863] Sending queued task 'task.zkfc.namenode.NameNodeExecutor.1460455438505' to executor 'executor.namenode.NameNodeExecutor.1460455438505' of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 at executor(1)@172.32.2.201:40841
I0412 10:04:02.716392 1129 slave.cpp:3002] Handling status update TASK_RUNNING (UUID: d207da47-d11a-47dd-b69a-d4ba54465daf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:40841
I0412 10:04:02.716552 1129 slave.cpp:3002] Handling status update TASK_RUNNING (UUID: 50c6c76a-40d6-4a5f-9c78-68eac6b0f5f9) for task task.zkfc.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:40841
I0412 10:04:02.716835 1131 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: d207da47-d11a-47dd-b69a-d4ba54465daf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:02.717103 1131 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_RUNNING (UUID: d207da47-d11a-47dd-b69a-d4ba54465daf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:02.725347 1129 slave.cpp:3400] Forwarding the update TASK_RUNNING (UUID: d207da47-d11a-47dd-b69a-d4ba54465daf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to [email protected]:5050
I0412 10:04:02.725388 1131 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 50c6c76a-40d6-4a5f-9c78-68eac6b0f5f9) for task task.zkfc.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:02.725458 1129 slave.cpp:3310] Sending acknowledgement for status update TASK_RUNNING (UUID: d207da47-d11a-47dd-b69a-d4ba54465daf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to executor(1)@172.32.2.201:40841
I0412 10:04:02.725586 1131 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_RUNNING (UUID: 50c6c76a-40d6-4a5f-9c78-68eac6b0f5f9) for task task.zkfc.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:02.729430 1127 slave.cpp:3400] Forwarding the update TASK_RUNNING (UUID: 50c6c76a-40d6-4a5f-9c78-68eac6b0f5f9) for task task.zkfc.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to [email protected]:5050
I0412 10:04:02.729521 1127 slave.cpp:3310] Sending acknowledgement for status update TASK_RUNNING (UUID: 50c6c76a-40d6-4a5f-9c78-68eac6b0f5f9) for task task.zkfc.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to executor(1)@172.32.2.201:40841
I0412 10:04:02.755759 1129 status_update_manager.cpp:392] Received status update acknowledgement (UUID: d207da47-d11a-47dd-b69a-d4ba54465daf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:02.755856 1129 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING (UUID: d207da47-d11a-47dd-b69a-d4ba54465daf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:02.798437 1129 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 50c6c76a-40d6-4a5f-9c78-68eac6b0f5f9) for task task.zkfc.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:02.798527 1129 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_RUNNING (UUID: 50c6c76a-40d6-4a5f-9c78-68eac6b0f5f9) for task task.zkfc.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:04.234256 1130 slave.cpp:3002] Handling status update TASK_FAILED (UUID: 5e6ef534-21e2-4b4d-b9a7-0c79517c42bf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:40841
I0412 10:04:04.234877 1130 status_update_manager.cpp:320] Received status update TASK_FAILED (UUID: 5e6ef534-21e2-4b4d-b9a7-0c79517c42bf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:04.234915 1130 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_FAILED (UUID: 5e6ef534-21e2-4b4d-b9a7-0c79517c42bf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:04.240129 1128 slave.cpp:3400] Forwarding the update TASK_FAILED (UUID: 5e6ef534-21e2-4b4d-b9a7-0c79517c42bf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to [email protected]:5050
I0412 10:04:04.240219 1128 slave.cpp:3310] Sending acknowledgement for status update TASK_FAILED (UUID: 5e6ef534-21e2-4b4d-b9a7-0c79517c42bf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to executor(1)@172.32.2.201:40841
I0412 10:04:04.254961 1130 slave.cpp:3002] Handling status update TASK_RUNNING (UUID: 0f54ea2b-b8e1-4e96-b386-b2ae9732305a) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 from executor(1)@172.32.2.201:40841
I0412 10:04:04.255317 1130 status_update_manager.cpp:320] Received status update TASK_RUNNING (UUID: 0f54ea2b-b8e1-4e96-b386-b2ae9732305a) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:04.255357 1130 status_update_manager.cpp:824] Checkpointing UPDATE for status update TASK_RUNNING (UUID: 0f54ea2b-b8e1-4e96-b386-b2ae9732305a) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:04.259516 1130 slave.cpp:3310] Sending acknowledgement for status update TASK_RUNNING (UUID: 0f54ea2b-b8e1-4e96-b386-b2ae9732305a) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002 to executor(1)@172.32.2.201:40841
I0412 10:04:04.274070 1129 status_update_manager.cpp:392] Received status update acknowledgement (UUID: 5e6ef534-21e2-4b4d-b9a7-0c79517c42bf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:04.274127 1129 status_update_manager.cpp:824] Checkpointing ACK for status update TASK_FAILED (UUID: 5e6ef534-21e2-4b4d-b9a7-0c79517c42bf) for task task.namenode.namenode.NameNodeExecutor.1460455438505 of framework 835221b1-c248-4c7a-b641-6b139b37a855-0002
I0412 10:04:57.438268 1129 slave.cpp:4374] Current disk usage 13.93%. Max allowed age: 5.324971961824769days
hdfs-mesos output
10:03:47.303 [main] INFO org.eclipse.jetty.util.log - Logging initialized @604ms
10:03:47.337 [main] INFO org.eclipse.jetty.server.Server - jetty-9.2.z-SNAPSHOT
10:03:47.357 [HdfsScheduler] INFO o.a.m.hdfs.scheduler.HdfsScheduler - user: "root"
name: "hdfs"
id {
value: "835221b1-c248-4c7a-b641-6b139b37a855-0002"
}
failover_timeout: 3.14496E7
checkpoint: true
role: "*"
10:03:47.357 [HdfsScheduler] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Registering without authentication
10:03:47.371 [main] INFO o.e.jetty.server.ServerConnector - Started ServerConnector@3eb7fc54{HTTP/1.1}{0.0.0.0:8765}
10:03:47.371 [main] INFO org.eclipse.jetty.server.Server - Started @677ms
10:03:47.387 [Thread-11] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Registered framework frameworkId=835221b1-c248-4c7a-b641-6b139b37a855-0002
10:03:47.387 [Thread-11] INFO o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'RECONCILING_TASKS'
10:03:47.389 [Thread-12] INFO o.a.m.hdfs.scheduler.HdfsScheduler - =========================================
10:03:47.389 [Thread-12] INFO o.a.m.hdfs.scheduler.HdfsScheduler - pendingTasks size: 1
10:03:47.389 [Thread-12] INFO o.a.m.hdfs.scheduler.HdfsScheduler - task.journalnode.journalnode.NodeExecutor.1460452855798
10:03:47.389 [Thread-12] INFO o.a.m.hdfs.scheduler.HdfsScheduler - =========================================
10:03:47.389 [Thread-12] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Explicitly Reconciling Tasks
10:03:47.398 [Thread-12] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Implicitly Reconciling Tasks
10:03:47.398 [Thread-12] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sleeping for 4000ms before retrying reconciliation.
10:03:47.409 [Thread-13] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received 6 offers
10:03:47.411 [Thread-13] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50699"
10:03:47.411 [Thread-13] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50700"
10:03:47.411 [Thread-13] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50701"
10:03:47.411 [Thread-13] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50702"
10:03:47.411 [Thread-13] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50703"
10:03:47.412 [Thread-13] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50704"
10:03:47.440 [Thread-13] INFO o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 0/2
10:03:47.440 [Thread-13] INFO o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'RECONCILING_TASKS'
10:03:47.440 [Thread-13] INFO o.a.mesos.hdfs.state.StateMachine - Current phase is now: RECONCILING_TASKS
10:03:47.444 [Thread-13] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Scheduler in phase: RECONCILING_TASKS, declining offer: value: "835221b1-c248-4c7a-b641-6b139b37a855-O50702"
10:03:47.446 [Thread-14] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.journalnode.journalnode.NodeExecutor.1460452855798 state=TASK_RUNNING message='Reconciliation: Latest task state'
10:03:47.449 [Thread-14] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
value: "task.journalnode.journalnode.NodeExecutor.1460452855798"
}
state: TASK_RUNNING
message: "Reconciliation: Latest task state"
slave_id {
value: "835221b1-c248-4c7a-b641-6b139b37a855-S37"
}
timestamp: 1.460455427439771E9
source: SOURCE_MASTER
reason: REASON_RECONCILIATION
13: "\n\036\022\f172.32.2.201*\016\022\f172.32.2.201"
10:03:47.459 [Thread-14] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.journalnode.journalnode.NodeExecutor.1460452855798
10:03:47.459 [Thread-14] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Reconciliation is NOT complete
10:03:47.459 [Thread-14] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Reconciling Task 'task.journalnode.journalnode.NodeExecutor.1460452855798'.
10:03:47.459 [Thread-14] INFO o.a.m.hdfs.scheduler.HdfsScheduler - =========================================
10:03:47.459 [Thread-14] INFO o.a.m.hdfs.scheduler.HdfsScheduler - pendingTasks size: 0
10:03:47.459 [Thread-14] INFO o.a.m.hdfs.scheduler.HdfsScheduler - =========================================
10:03:47.459 [Thread-14] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Reconciliation is complete
10:03:47.462 [Thread-14] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1460452855798, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:47.472 [Thread-14] INFO o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 0/2
10:03:47.472 [Thread-14] INFO o.a.mesos.hdfs.state.StateMachine - Transitioning from acquisition phase 'RECONCILING_TASKS' to 'NAME_NODES'
10:03:47.472 [Thread-14] INFO o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
10:03:47.472 [Thread-15] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.journalnode.journalnode.NodeExecutor.1460452855798 state=TASK_RUNNING message='Reconciliation: Latest task state'
10:03:47.473 [Thread-15] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
value: "task.journalnode.journalnode.NodeExecutor.1460452855798"
}
state: TASK_RUNNING
message: "Reconciliation: Latest task state"
slave_id {
value: "835221b1-c248-4c7a-b641-6b139b37a855-S37"
}
timestamp: 1.460455427440362E9
source: SOURCE_MASTER
reason: REASON_RECONCILIATION
13: "\n\036\022\f172.32.2.201*\016\022\f172.32.2.201"
10:03:47.479 [Thread-15] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.journalnode.journalnode.NodeExecutor.1460452855798
10:03:47.482 [Thread-15] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1460452855798, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:47.487 [Thread-15] INFO o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 0/2
10:03:47.487 [Thread-15] INFO o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
10:03:47.487 [Thread-15] INFO o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
10:03:52.480 [Thread-16] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received 6 offers
10:03:52.481 [Thread-16] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50705"
10:03:52.481 [Thread-16] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50706"
10:03:52.481 [Thread-16] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50707"
10:03:52.481 [Thread-16] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50708"
10:03:52.481 [Thread-16] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50709"
10:03:52.482 [Thread-16] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50710"
10:03:52.483 [Thread-16] INFO o.a.m.hdfs.scheduler.NodeLauncher - Node: namenode, evaluating offer: value: "835221b1-c248-4c7a-b641-6b139b37a855-O50705"
10:03:52.484 [Thread-16] INFO o.apache.mesos.hdfs.util.DnsResolver - Resolving DNS for journalnode1.hdfs.rosen.local
10:03:52.486 [Thread-16] INFO o.apache.mesos.hdfs.util.DnsResolver - Successfully found journalnode1.hdfs.rosen.local
10:03:52.500 [Thread-16] INFO o.a.m.hdfs.scheduler.NodeLauncher - Node: namenode, accepting offer: value: "835221b1-c248-4c7a-b641-6b139b37a855-O50705"
10:03:52.523 [Thread-16] INFO o.a.mesos.hdfs.scheduler.HdfsNode - [name: "LD_LIBRARY_PATH"
value: "/usr/local/lib"
, name: "EXECUTOR_OPTS"
value: "-Xmx256m -Xms256m"
]
10:03:52.527 [Thread-16] INFO o.a.mesos.hdfs.scheduler.HdfsNode - [name: "LD_LIBRARY_PATH"
value: "/usr/local/lib"
, name: "EXECUTOR_OPTS"
value: "-Xmx256m -Xms256m"
]
10:03:52.528 [Thread-16] WARN o.apache.mesos.hdfs.state.HdfsState - Deserialization failed (as expected when setting status for the first time) with exception: java.io.EOFException
10:03:52.528 [Thread-16] INFO o.apache.mesos.hdfs.state.HdfsState - Setting new status: null
10:03:52.544 [Thread-16] WARN o.apache.mesos.hdfs.state.HdfsState - Deserialization failed (as expected when setting status for the first time) with exception: java.io.EOFException
10:03:52.544 [Thread-16] INFO o.apache.mesos.hdfs.state.HdfsState - Setting new status: null
10:03:56.627 [Thread-17] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.namenode.namenode.NameNodeExecutor.1460455432500 state=TASK_RUNNING message=''
10:03:56.629 [Thread-17] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
value: "task.namenode.namenode.NameNodeExecutor.1460455432500"
}
state: TASK_RUNNING
slave_id {
value: "835221b1-c248-4c7a-b641-6b139b37a855-S37"
}
timestamp: 1.460455436628709E9
executor_id {
value: "executor.namenode.NameNodeExecutor.1460455432500"
}
source: SOURCE_EXECUTOR
uuid: "\363r>8u\253@\321\204\356^^\207\006d`"
labels {
labels {
key: "status"
value: "uninitialized"
}
}
13: "\n\036\022\f172.32.2.201*\016\022\f172.32.2.201"
10:03:56.636 [Thread-17] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.namenode.namenode.NameNodeExecutor.1460455432500
10:03:56.641 [Thread-17] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1460452855798, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:56.641 [Thread-17] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.namenode.namenode.NameNodeExecutor.1460455432500, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:56.654 [Thread-17] INFO o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 1/2
10:03:56.654 [Thread-17] INFO o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
10:03:56.654 [Thread-17] INFO o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
10:03:56.654 [Thread-18] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.zkfc.namenode.NameNodeExecutor.1460455432500 state=TASK_RUNNING message=''
10:03:56.655 [Thread-18] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
value: "task.zkfc.namenode.NameNodeExecutor.1460455432500"
}
state: TASK_RUNNING
slave_id {
value: "835221b1-c248-4c7a-b641-6b139b37a855-S37"
}
timestamp: 1.460455436628843E9
executor_id {
value: "executor.namenode.NameNodeExecutor.1460455432500"
}
source: SOURCE_EXECUTOR
uuid: ".#\246\024\023eCM\201I;\276\202h\224X"
13: "\n\036\022\f172.32.2.201*\016\022\f172.32.2.201"
10:03:56.663 [Thread-18] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.zkfc.namenode.NameNodeExecutor.1460455432500
10:03:56.668 [Thread-18] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1460452855798, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:56.669 [Thread-18] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.namenode.namenode.NameNodeExecutor.1460455432500, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:56.669 [Thread-18] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1460455432500, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:56.681 [Thread-18] INFO o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 1/2
10:03:56.681 [Thread-18] INFO o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
10:03:56.681 [Thread-18] INFO o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
10:03:58.121 [Thread-19] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.namenode.namenode.NameNodeExecutor.1460455432500 state=TASK_FAILED message=''
10:03:58.122 [Thread-19] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
value: "task.namenode.namenode.NameNodeExecutor.1460455432500"
}
state: TASK_FAILED
slave_id {
value: "835221b1-c248-4c7a-b641-6b139b37a855-S37"
}
timestamp: 1.460455438120914E9
executor_id {
value: "executor.namenode.NameNodeExecutor.1460455432500"
}
source: SOURCE_EXECUTOR
uuid: "\005\332\242\317\316\001A\247\266\362\022\322\030\366s0"
13: "\n\036\022\f172.32.2.201*\016\022\f172.32.2.201"
10:03:58.130 [Thread-19] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.namenode.namenode.NameNodeExecutor.1460455432500
10:03:58.134 [qtp63001505-17] WARN org.eclipse.jetty.server.HttpChannel - /hdfs-site.xml?nn=namenode1
java.io.IOException: java.io.EOFException
at org.apache.mesos.hdfs.config.ConfigServer$ServeHdfsConfigHandler.handle(ConfigServer.java:108) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.eclipse.jetty.server.Server.handle(Server.java:485) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:290) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) [hdfs-scheduler-0.1.5-uber.jar:na]
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) [hdfs-scheduler-0.1.5-uber.jar:na]
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:606) [hdfs-scheduler-0.1.5-uber.jar:na]
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:535) [hdfs-scheduler-0.1.5-uber.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
Caused by: java.io.EOFException: null
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2335) ~[na:1.8.0_77]
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2804) ~[na:1.8.0_77]
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:802) ~[na:1.8.0_77]
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299) ~[na:1.8.0_77]
at org.apache.mesos.hdfs.state.Serializer.deserialize(Serializer.java:25) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.apache.mesos.hdfs.state.HdfsState.getTask(HdfsState.java:226) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.apache.mesos.hdfs.state.HdfsState.getTasks(HdfsState.java:160) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.apache.mesos.hdfs.state.HdfsState.getTasks(HdfsState.java:169) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.apache.mesos.hdfs.state.HdfsState.getJournalNodeTasks(HdfsState.java:254) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.apache.mesos.hdfs.config.ConfigServer$ServeHdfsConfigHandler.handle(ConfigServer.java:106) ~[hdfs-scheduler-0.1.5-uber.jar:na]
... 9 common frames omitted
10:03:58.134 [Thread-19] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1460452855798, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:58.135 [Thread-19] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1460455432500, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:03:58.145 [Thread-19] INFO o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 0/2
10:03:58.145 [Thread-19] INFO o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
10:03:58.145 [Thread-19] INFO o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
10:03:58.485 [Thread-20] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received 6 offers
10:03:58.486 [Thread-20] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50711"
10:03:58.486 [Thread-20] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50712"
10:03:58.486 [Thread-20] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50713"
10:03:58.487 [Thread-20] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50714"
10:03:58.487 [Thread-20] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50715"
10:03:58.487 [Thread-20] INFO o.a.m.hdfs.scheduler.HdfsScheduler - value: "835221b1-c248-4c7a-b641-6b139b37a855-O50716"
10:03:58.487 [Thread-20] INFO o.a.m.hdfs.scheduler.NodeLauncher - Node: namenode, evaluating offer: value: "835221b1-c248-4c7a-b641-6b139b37a855-O50712"
10:03:58.487 [Thread-20] INFO o.apache.mesos.hdfs.util.DnsResolver - Resolving DNS for journalnode1.hdfs.rosen.local
10:03:58.487 [Thread-20] INFO o.apache.mesos.hdfs.util.DnsResolver - Successfully found journalnode1.hdfs.rosen.local
10:03:58.504 [Thread-20] INFO o.a.m.hdfs.scheduler.NodeLauncher - Node: namenode, accepting offer: value: "835221b1-c248-4c7a-b641-6b139b37a855-O50712"
10:03:58.513 [Thread-20] INFO o.a.mesos.hdfs.scheduler.HdfsNode - [name: "LD_LIBRARY_PATH"
value: "/usr/local/lib"
, name: "EXECUTOR_OPTS"
value: "-Xmx256m -Xms256m"
]
10:03:58.514 [Thread-20] INFO o.a.mesos.hdfs.scheduler.HdfsNode - [name: "LD_LIBRARY_PATH"
value: "/usr/local/lib"
, name: "EXECUTOR_OPTS"
value: "-Xmx256m -Xms256m"
]
10:03:58.515 [Thread-20] WARN o.apache.mesos.hdfs.state.HdfsState - Deserialization failed (as expected when setting status for the first time) with exception: java.io.EOFException
10:03:58.515 [Thread-20] INFO o.apache.mesos.hdfs.state.HdfsState - Setting new status: null
10:03:58.532 [Thread-20] WARN o.apache.mesos.hdfs.state.HdfsState - Deserialization failed (as expected when setting status for the first time) with exception: java.io.EOFException
10:03:58.532 [Thread-20] INFO o.apache.mesos.hdfs.state.HdfsState - Setting new status: null
10:04:02.715 [Thread-21] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.namenode.namenode.NameNodeExecutor.1460455438505 state=TASK_RUNNING message=''
10:04:02.715 [Thread-21] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
value: "task.namenode.namenode.NameNodeExecutor.1460455438505"
}
state: TASK_RUNNING
slave_id {
value: "835221b1-c248-4c7a-b641-6b139b37a855-S37"
}
timestamp: 1.460455442715895E9
executor_id {
value: "executor.namenode.NameNodeExecutor.1460455438505"
}
source: SOURCE_EXECUTOR
uuid: "\322\a\332G\321\032G\335\266\232\324\272TF]\257"
labels {
labels {
key: "status"
value: "uninitialized"
}
}
13: "\n\036\022\f172.32.2.201*\016\022\f172.32.2.201"
10:04:02.722 [Thread-21] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.namenode.namenode.NameNodeExecutor.1460455438505
10:04:02.728 [Thread-21] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1460452855798, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:02.728 [Thread-21] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.namenode.namenode.NameNodeExecutor.1460455438505, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:02.728 [Thread-21] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1460455432500, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:02.742 [Thread-21] INFO o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 1/2
10:04:02.742 [Thread-21] INFO o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
10:04:02.742 [Thread-21] INFO o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
10:04:02.742 [Thread-22] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.zkfc.namenode.NameNodeExecutor.1460455438505 state=TASK_RUNNING message=''
10:04:02.743 [Thread-22] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
value: "task.zkfc.namenode.NameNodeExecutor.1460455438505"
}
state: TASK_RUNNING
slave_id {
value: "835221b1-c248-4c7a-b641-6b139b37a855-S37"
}
timestamp: 1.460455442716026E9
executor_id {
value: "executor.namenode.NameNodeExecutor.1460455438505"
}
source: SOURCE_EXECUTOR
uuid: "P\306\307j@\326J_\234xh\352\306\260\365\371"
13: "\n\036\022\f172.32.2.201*\016\022\f172.32.2.201"
10:04:02.750 [Thread-22] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.zkfc.namenode.NameNodeExecutor.1460455438505
10:04:02.756 [Thread-22] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1460455438505, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:02.756 [Thread-22] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1460452855798, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:02.756 [Thread-22] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.namenode.namenode.NameNodeExecutor.1460455438505, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:02.756 [Thread-22] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1460455432500, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:02.772 [Thread-22] INFO o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 1/2
10:04:02.772 [Thread-22] INFO o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
10:04:02.772 [Thread-22] INFO o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
10:04:04.229 [Thread-23] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.namenode.namenode.NameNodeExecutor.1460455438505 state=TASK_FAILED message=''
10:04:04.230 [Thread-23] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Notifying observers of TaskStatus: task_id {
value: "task.namenode.namenode.NameNodeExecutor.1460455438505"
}
state: TASK_FAILED
slave_id {
value: "835221b1-c248-4c7a-b641-6b139b37a855-S37"
}
timestamp: 1.460455444232839E9
executor_id {
value: "executor.namenode.NameNodeExecutor.1460455438505"
}
source: SOURCE_EXECUTOR
uuid: "^n\3654!\342KM\271\247\fyQ|B\277"
13: "\n\036\022\f172.32.2.201*\016\022\f172.32.2.201"
10:04:04.236 [Thread-23] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Received task update for: task.namenode.namenode.NameNodeExecutor.1460455438505
10:04:04.237 [qtp63001505-17] WARN org.eclipse.jetty.server.HttpChannel - /hdfs-site.xml?nn=namenode1
java.io.IOException: java.io.EOFException
at org.apache.mesos.hdfs.config.ConfigServer$ServeHdfsConfigHandler.handle(ConfigServer.java:108) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.eclipse.jetty.server.Server.handle(Server.java:485) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:290) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) [hdfs-scheduler-0.1.5-uber.jar:na]
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) [hdfs-scheduler-0.1.5-uber.jar:na]
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:606) [hdfs-scheduler-0.1.5-uber.jar:na]
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:535) [hdfs-scheduler-0.1.5-uber.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
Caused by: java.io.EOFException: null
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2335) ~[na:1.8.0_77]
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2804) ~[na:1.8.0_77]
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:802) ~[na:1.8.0_77]
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299) ~[na:1.8.0_77]
at org.apache.mesos.hdfs.state.Serializer.deserialize(Serializer.java:25) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.apache.mesos.hdfs.state.HdfsState.getTask(HdfsState.java:226) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.apache.mesos.hdfs.state.HdfsState.getTasks(HdfsState.java:160) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.apache.mesos.hdfs.state.HdfsState.getTasks(HdfsState.java:169) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.apache.mesos.hdfs.state.HdfsState.getNameNodeTasks(HdfsState.java:249) ~[hdfs-scheduler-0.1.5-uber.jar:na]
at org.apache.mesos.hdfs.config.ConfigServer$ServeHdfsConfigHandler.handle(ConfigServer.java:105) ~[hdfs-scheduler-0.1.5-uber.jar:na]
... 9 common frames omitted
10:04:04.241 [Thread-23] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1460455438505, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:04.241 [Thread-23] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.journalnode.journalnode.NodeExecutor.1460452855798, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:04.241 [Thread-23] INFO o.a.m.hdfs.scheduler.HdfsScheduler - Sending message 'reload config' to taskId=task.zkfc.namenode.NameNodeExecutor.1460455432500, slaveId=835221b1-c248-4c7a-b641-6b139b37a855-S37
10:04:04.260 [Thread-23] INFO o.a.mesos.hdfs.state.StateMachine - Correcting phase with journal counts: 1/1 and name counts: 0/2
10:04:04.260 [Thread-23] INFO o.a.mesos.hdfs.state.StateMachine - Acquisition phase is already 'NAME_NODES'
10:04:04.260 [Thread-23] INFO o.a.mesos.hdfs.state.StateMachine - Current phase is now: NAME_NODES
What is your recommendation?