hdfs-deprecated icon indicating copy to clipboard operation
hdfs-deprecated copied to clipboard

Failed to start the framework

Open a2045z opened this issue 10 years ago • 3 comments

I tried to launch the HDFS framework on top of Mesos using Virtual Box and Vagrant. Here is the Vagrantfile I am using:

Vagrant.configure(2) do |config|
  config.vm.box = "chef/centos-7.0"

  config.vm.define "m1" do |n|
    n.vm.network "private_network", ip: "192.168.33.10"
    n.vm.hostname = "m1"
  end

  config.vm.define "m2" do |n|
    n.vm.network "private_network", ip: "192.168.33.11"
    n.vm.hostname = "m2"
  end

  config.vm.define "m3" do |n|
    n.vm.network "private_network", ip: "192.168.33.12"
    n.vm.hostname = "m3"
  end

  config.vm.define "s1" do |n|
    n.vm.network "private_network", ip: "192.168.33.13"
    n.vm.hostname = "s1"
  end

  config.vm.define "s2" do |n|
    n.vm.network "private_network", ip: "192.168.33.14"
    n.vm.hostname = "s2"
  end

  config.vm.define "s3" do |n|
    n.vm.network "private_network", ip: "192.168.33.15"
    n.vm.hostname = "s3"
  end

  config.vm.define "s4" do |n|
    n.vm.network "private_network", ip: "192.168.33.16"
    n.vm.hostname = "s4"
  end
end

I am setting up a 7-node cluster with 3 master nodes running on m1, m2 and m3, and 4 slave nodes running on s1, s2, s3 and s4. I was able to launch the Mesos cluster, but when I tried to start the HDFS framework, it seemed that only the 3 JN nodes were started normally, here is the scheduler log:

bash$ ./bin/hdfs-mesos
2015-08-24 12:02:14,626:3630(0x7f89a0d03700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2015-08-24 12:02:14,626:3630(0x7f89a0d03700):ZOO_INFO@log_env@716: Client environment:host.name=s4
2015-08-24 12:02:14,626:3630(0x7f89a0d03700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2015-08-24 12:02:14,626:3630(0x7f89a0d03700):ZOO_INFO@log_env@724: Client environment:os.arch=3.10.0-123.el7.x86_64
2015-08-24 12:02:14,626:3630(0x7f89a0d03700):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Mon Jun 30 12:09:22 UTC 2014
2015-08-24 12:02:14,626:3630(0x7f89a0d03700):ZOO_INFO@log_env@733: Client environment:user.name=vagrant
2015-08-24 12:02:14,626:3630(0x7f89a0d03700):ZOO_INFO@log_env@741: Client environment:user.home=/home/vagrant
2015-08-24 12:02:14,626:3630(0x7f89a0d03700):ZOO_INFO@log_env@753: Client environment:user.dir=/vagrant/hdfs/build/hdfs-mesos-0.1.3
2015-08-24 12:02:14,626:3630(0x7f89a0d03700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=m1:2181,m2:2181,m3:2181 sessionTimeout=20000 watcher=0x7f89a2d920b0 sessionId=0 sessionPasswd=<null> context=0x7f89a8376780 flags=0
2015-08-24 12:02:14,630:3630(0x7f8997ff7700):ZOO_INFO@check_events@1703: initiated connection to server [192.168.33.10:2181]
2015-08-24 12:02:14,635:3630(0x7f8997ff7700):ZOO_INFO@check_events@1750: session establishment complete on server [192.168.33.10:2181], sessionId=0x14f5f14a500000c, negotiated timeout=20000
12:02:14.745 [main] INFO  org.eclipse.jetty.util.log - Logging initialized @1465ms
12:02:14.804 [HdfsScheduler] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Registering without authentication
I0824 12:02:14.809234  3652 sched.cpp:157] Version: 0.23.0
2015-08-24 12:02:14,811:3630(0x7f899bfff700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2015-08-24 12:02:14,811:3630(0x7f899bfff700):ZOO_INFO@log_env@716: Client environment:host.name=s4
2015-08-24 12:02:14,811:3630(0x7f899bfff700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2015-08-24 12:02:14,811:3630(0x7f899bfff700):ZOO_INFO@log_env@724: Client environment:os.arch=3.10.0-123.el7.x86_64
2015-08-24 12:02:14,811:3630(0x7f899bfff700):ZOO_INFO@log_env@725: Client environment:os.version=#1 SMP Mon Jun 30 12:09:22 UTC 2014
2015-08-24 12:02:14,811:3630(0x7f899bfff700):ZOO_INFO@log_env@733: Client environment:user.name=vagrant
2015-08-24 12:02:14,811:3630(0x7f899bfff700):ZOO_INFO@log_env@741: Client environment:user.home=/home/vagrant
2015-08-24 12:02:14,811:3630(0x7f899bfff700):ZOO_INFO@log_env@753: Client environment:user.dir=/vagrant/hdfs/build/hdfs-mesos-0.1.3
2015-08-24 12:02:14,811:3630(0x7f899bfff700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=m1:2181,m2:2181,m3:2181 sessionTimeout=10000 watcher=0x7f89a2d920b0 sessionId=0 sessionPasswd=<null> context=0x7f89bc001030 flags=0
2015-08-24 12:02:14,819:3630(0x7f8996ff5700):ZOO_INFO@check_events@1703: initiated connection to server [192.168.33.10:2181]
2015-08-24 12:02:14,835:3630(0x7f8996ff5700):ZOO_INFO@check_events@1750: session establishment complete on server [192.168.33.10:2181], sessionId=0x14f5f14a500000d, negotiated timeout=10000
I0824 12:02:14.835803  3648 group.cpp:313] Group process (group(1)@192.168.33.16:42345) connected to ZooKeeper
I0824 12:02:14.835847  3648 group.cpp:787] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0824 12:02:14.835858  3648 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
I0824 12:02:14.844012  3648 detector.cpp:138] Detected a new leader: (id='5')
I0824 12:02:14.844187  3648 group.cpp:656] Trying to get '/mesos/info_0000000005' in ZooKeeper
W0824 12:02:14.846402  3648 detector.cpp:444] Leading master [email protected]:5050 is using a Protobuf binary format when registering with ZooKeeper (info): this will be deprecated as of Mesos 0.24 (see MESOS-2340)
I0824 12:02:14.846482  3648 detector.cpp:481] A new leading master ([email protected]:5050) is detected
I0824 12:02:14.846539  3648 sched.cpp:254] New master detected at [email protected]:5050
I0824 12:02:14.846808  3648 sched.cpp:264] No credentials provided. Attempting to register without authentication
I0824 12:02:14.862030  3648 sched.cpp:448] Framework registered with 20150824-093959-169978048-5050-975-0000
12:02:14.883 [Thread-2] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Registered framework frameworkId=20150824-093959-169978048-5050-975-0000
12:02:14.939 [Thread-4] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 4 offers
12:02:14.940 [Thread-4] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:14.944 [Thread-4] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:14.944 [Thread-4] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:14.944 [Thread-4] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:14.954 [main] INFO  org.eclipse.jetty.server.Server - jetty-9.2.z-SNAPSHOT
12:02:14.962 [Thread-5] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.journalnode.journalnode.NodeExecutor.1440417681565 state=TASK_STAGING message='Reconciliation: Latest task state' stagingTasks.size=0
12:02:14.962 [Thread-5] WARN  o.a.m.hdfs.scheduler.HdfsScheduler - Don't know how to handle state=TASK_STAGING for taskId=task.journalnode.journalnode.NodeExecutor.1440417681565
12:02:14.963 [Thread-7] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.journalnode.journalnode.NodeExecutor.1440417678559 state=TASK_STAGING message='Reconciliation: Latest task state' stagingTasks.size=0
12:02:14.963 [Thread-7] WARN  o.a.m.hdfs.scheduler.HdfsScheduler - Don't know how to handle state=TASK_STAGING for taskId=task.journalnode.journalnode.NodeExecutor.1440417678559
12:02:14.963 [Thread-8] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.journalnode.journalnode.NodeExecutor.1440417675554 state=TASK_STAGING message='Reconciliation: Latest task state' stagingTasks.size=0
12:02:14.964 [Thread-8] WARN  o.a.m.hdfs.scheduler.HdfsScheduler - Don't know how to handle state=TASK_STAGING for taskId=task.journalnode.journalnode.NodeExecutor.1440417675554
12:02:14.964 [Thread-9] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.journalnode.journalnode.NodeExecutor.1440417597479 state=TASK_STAGING message='Reconciliation: Latest task state' stagingTasks.size=0
12:02:14.964 [Thread-9] WARN  o.a.m.hdfs.scheduler.HdfsScheduler - Don't know how to handle state=TASK_STAGING for taskId=task.journalnode.journalnode.NodeExecutor.1440417597479
12:02:14.964 [Thread-10] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.journalnode.journalnode.NodeExecutor.1440417591492 state=TASK_STAGING message='Reconciliation: Latest task state' stagingTasks.size=0
12:02:14.965 [Thread-10] WARN  o.a.m.hdfs.scheduler.HdfsScheduler - Don't know how to handle state=TASK_STAGING for taskId=task.journalnode.journalnode.NodeExecutor.1440417591492
12:02:14.965 [Thread-11] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.journalnode.journalnode.NodeExecutor.1440417590467 state=TASK_STAGING message='Reconciliation: Latest task state' stagingTasks.size=0
12:02:14.965 [Thread-11] WARN  o.a.m.hdfs.scheduler.HdfsScheduler - Don't know how to handle state=TASK_STAGING for taskId=task.journalnode.journalnode.NodeExecutor.1440417590467
12:02:15.063 [main] INFO  o.e.jetty.server.ServerConnector - Started ServerConnector@6dbb137d{HTTP/1.1}{0.0.0.0:8765}
12:02:15.063 [main] INFO  org.eclipse.jetty.server.Server - Started @1807ms
12:02:20.624 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 4 offers
12:02:20.624 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:20.624 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:20.624 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:20.624 [Thread-20] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:26.632 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 4 offers
12:02:26.632 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:26.632 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:26.632 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:26.632 [Thread-21] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:32.636 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 4 offers
12:02:32.636 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:32.637 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:32.637 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:32.638 [Thread-22] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:35.710 [Thread-23] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.journalnode.journalnode.NodeExecutor.1440417597479 state=TASK_LOST message='Abnormal executor termination' stagingTasks.size=0
12:02:35.824 [Thread-23] WARN  o.a.m.h.state.PersistentStateStore - task id: task.journalnode.journalnode.NodeExecutor.1440417597479 request to be removed doesn't exist
12:02:36.638 [Thread-24] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:02:36.638 [Thread-24] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:38.640 [Thread-25] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
12:02:38.640 [Thread-25] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:38.641 [Thread-25] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:38.641 [Thread-25] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:41.645 [Thread-26] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:02:41.645 [Thread-26] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:43.651 [Thread-27] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
12:02:43.651 [Thread-27] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:43.651 [Thread-27] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:43.652 [Thread-27] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Declining offers while reconciling tasks
12:02:44.886 [Timer-0] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Current persistent state:
12:02:44.889 [Timer-0] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - JournalNodes: {s4=task.journalnode.journalnode.NodeExecutor.1440417675554, s1=task.journalnode.journalnode.NodeExecutor.1440417681565, s2=task.journalnode.journalnode.NodeExecutor.1440417678559}, {task.journalnode.journalnode.NodeExecutor.1440417681565=journalnode3, task.journalnode.journalnode.NodeExecutor.1440417675554=journalnode1, task.journalnode.journalnode.NodeExecutor.1440417678559=journalnode2}
12:02:44.892 [Timer-0] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - NameNodes: {}, {}
12:02:44.895 [Timer-0] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - DataNodes: {}
12:02:44.901 [Timer-0] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Removing task id: task.journalnode.journalnode.NodeExecutor.1440417681565
12:02:44.919 [Timer-0] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Removing task id: task.journalnode.journalnode.NodeExecutor.1440417675554
12:02:44.938 [Timer-0] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Removing task id: task.journalnode.journalnode.NodeExecutor.1440417678559
12:02:46.650 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:02:46.652 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - [s4, s1, s2]
12:02:46.653 [Thread-28] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Launching node of type journalnode with tasks [journalnode]
12:02:49.651 [Thread-29] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
12:02:49.655 [Thread-29] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - [s4, s2]
12:02:49.657 [Thread-29] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - [s4, s2]
12:02:49.657 [Thread-29] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Launching node of type journalnode with tasks [journalnode]
12:02:50.657 [Thread-30] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:02:50.661 [Thread-30] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - [s2]
12:02:52.652 [Thread-31] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:02:52.654 [Thread-31] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - [s2]
12:02:55.657 [Thread-32] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
12:02:55.659 [Thread-32] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - [s2]
12:02:55.660 [Thread-32] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - [s2]
12:02:55.661 [Thread-32] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Launching node of type journalnode with tasks [journalnode]
12:02:56.658 [Thread-33] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
12:02:56.662 [Thread-33] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:02:56.664 [Thread-33] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:02:56.666 [Thread-33] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:02:56.667 [Thread-33] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:02:57.658 [Thread-34] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:02:57.660 [Thread-34] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:02:57.662 [Thread-34] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:01.664 [Thread-35] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:03:01.666 [Thread-35] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:01.667 [Thread-35] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:02.666 [Thread-36] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
12:03:02.668 [Thread-36] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:02.672 [Thread-36] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:02.676 [Thread-36] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:02.685 [Thread-36] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:02.690 [Thread-36] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:02.691 [Thread-36] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:07.672 [Thread-37] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:03:07.674 [Thread-37] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:07.675 [Thread-37] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:08.673 [Thread-38] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
12:03:08.675 [Thread-38] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:08.676 [Thread-38] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:08.678 [Thread-38] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:08.680 [Thread-38] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:08.682 [Thread-38] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:08.684 [Thread-38] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:13.680 [Thread-39] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:03:13.684 [Thread-39] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:13.686 [Thread-39] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:14.680 [Thread-40] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
12:03:14.682 [Thread-40] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:14.683 [Thread-40] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:14.688 [Thread-40] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:14.689 [Thread-40] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:14.691 [Thread-40] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:14.692 [Thread-40] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:19.683 [Thread-41] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:03:19.687 [Thread-41] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:19.688 [Thread-41] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:20.686 [Thread-42] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
12:03:20.688 [Thread-42] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:20.690 [Thread-42] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:20.694 [Thread-42] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:20.696 [Thread-42] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:20.698 [Thread-42] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:20.699 [Thread-42] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:25.691 [Thread-43] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:03:25.695 [Thread-43] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:25.698 [Thread-43] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:26.691 [Thread-44] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
12:03:26.695 [Thread-44] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:26.696 [Thread-44] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:26.698 [Thread-44] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:26.699 [Thread-44] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:26.701 [Thread-44] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:26.702 [Thread-44] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:31.696 [Thread-45] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:03:31.697 [Thread-45] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:31.699 [Thread-45] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:32.698 [Thread-46] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
12:03:32.700 [Thread-46] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:32.702 [Thread-46] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:32.704 [Thread-46] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:32.706 [Thread-46] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:32.707 [Thread-46] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:32.709 [Thread-46] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:37.700 [Thread-47] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:03:37.703 [Thread-47] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:37.704 [Thread-47] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:38.701 [Thread-48] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
12:03:38.704 [Thread-48] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:38.705 [Thread-48] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:38.710 [Thread-48] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:38.711 [Thread-48] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:38.713 [Thread-48] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:38.714 [Thread-48] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:43.709 [Thread-49] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:03:43.713 [Thread-49] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:43.717 [Thread-49] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:44.707 [Thread-50] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
12:03:44.711 [Thread-50] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:44.713 [Thread-50] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:44.714 [Thread-50] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:44.716 [Thread-50] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:44.718 [Thread-50] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:44.721 [Thread-50] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:49.711 [Thread-51] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:03:49.715 [Thread-51] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:49.733 [Thread-51] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:50.711 [Thread-52] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
12:03:50.713 [Thread-52] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:50.714 [Thread-52] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:50.717 [Thread-52] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:50.719 [Thread-52] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:50.722 [Thread-52] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:50.724 [Thread-52] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:55.719 [Thread-53] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:03:55.720 [Thread-53] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:55.722 [Thread-53] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:56.720 [Thread-54] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 3 offers
12:03:56.722 [Thread-54] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:56.723 [Thread-54] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:56.725 [Thread-54] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:56.726 [Thread-54] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:56.728 [Thread-54] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:56.729 [Thread-54] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:03:59.615 [Thread-55] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received status update for taskId=task.journalnode.journalnode.NodeExecutor.1440417681565 state=TASK_LOST message='Abnormal executor termination' stagingTasks.size=3
12:03:59.620 [Thread-55] WARN  o.a.m.h.state.PersistentStateStore - task id: task.journalnode.journalnode.NodeExecutor.1440417681565 request to be removed doesn't exist
12:03:59.726 [Thread-56] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:03:59.730 [Thread-56] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:03:59.732 [Thread-56] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:00.726 [Thread-57] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:04:00.728 [Thread-57] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:00.731 [Thread-57] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:02.727 [Thread-58] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
12:04:02.730 [Thread-58] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:02.732 [Thread-58] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:02.734 [Thread-58] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:02.735 [Thread-58] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:05.730 [Thread-59] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:04:05.731 [Thread-59] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:05.733 [Thread-59] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:06.731 [Thread-60] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:04:06.733 [Thread-60] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:06.735 [Thread-60] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:08.734 [Thread-61] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
12:04:08.736 [Thread-61] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:08.737 [Thread-61] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:08.740 [Thread-61] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:08.741 [Thread-61] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:10.736 [Thread-62] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:04:10.738 [Thread-62] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:10.740 [Thread-62] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:11.738 [Thread-63] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:04:11.739 [Thread-63] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:11.741 [Thread-63] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:14.742 [Thread-64] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
12:04:14.752 [Thread-64] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:14.753 [Thread-64] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:14.754 [Thread-64] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:14.756 [Thread-64] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:16.743 [Thread-65] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:04:16.748 [Thread-65] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:16.749 [Thread-65] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:17.746 [Thread-66] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:04:17.749 [Thread-66] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:17.751 [Thread-66] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:20.749 [Thread-67] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
12:04:20.750 [Thread-67] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:20.752 [Thread-67] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:20.763 [Thread-67] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:20.766 [Thread-67] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:22.751 [Thread-68] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:04:22.754 [Thread-68] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:22.755 [Thread-68] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:23.753 [Thread-69] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:04:23.755 [Thread-69] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:23.756 [Thread-69] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:26.758 [Thread-70] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
12:04:26.761 [Thread-70] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:26.762 [Thread-70] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:26.764 [Thread-70] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:26.765 [Thread-70] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:27.759 [Thread-71] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:04:27.761 [Thread-71] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:27.762 [Thread-71] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:28.760 [Thread-72] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 1 offers
12:04:28.762 [Thread-72] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:04:28.763 [Thread-72] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:04:32.762 [Thread-73] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 2 offers
12:04:32.765 [Thread-73] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
...

a2045z avatar Aug 24 '15 12:08 a2045z

I checked one of the executor under the directory /tmp/mesos/slaves/20150824-093959-169978048-5050-975-S7/frameworks/20150824-093959-169978048-5050-975-0000/executors/executor.journalnode.NodeExecutor.1440418836711/runs/latest

It shows that the hdfs-site.xml was not resolved correctly:

<configuration>
  <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>

  <property>
    <name>dfs.nameservice.id</name>
    <value>hdfs</value>
  </property>

  <property>
    <name>dfs.nameservices</name>
    <value>hdfs</value>
  </property>

  <property>
    <name>dfs.ha.namenodes.hdfs</name>
    <value>nn1,nn2</value>
  </property>

  <property>
    <name>dfs.namenode.rpc-address.hdfs.nn1</name>
    <value>:50071</value>
  </property>

  <property>
    <name>dfs.namenode.http-address.hdfs.nn1</name>
    <value>:50070</value>
  </property>

  <property>
    <name>dfs.namenode.rpc-address.hdfs.nn2</name>
    <value>:50071</value>
  </property>

  <property>
    <name>dfs.namenode.http-address.hdfs.nn2</name>
    <value>:50070</value>
  </property>

  <property>
    <name>dfs.client.failover.proxy.provider.hdfs</name>
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>

  <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://s1:8485;s2:8485;s4:8485/hdfs</value>
  </property>

  <property>
    <name>ha.zookeeper.quorum</name>
    <value>m1:2181,m2:2181,m3:2181</value>
  </property>

  <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/var/lib/hdfs/data/jn</value>
  </property>

  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///var/lib/hdfs/data/name</value>
  </property>

  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///var/lib/hdfs/data/data</value>
  </property>

  <property>
    <name>dfs.ha.fencing.methods</name>
    <value>shell(/bin/true)</value>
  </property>

  <property>
    <name>dfs.permissions</name>
    <value>false</value>
  </property>

  <property>
    <name>dfs.datanode.du.reserved</name>
    <value>10485760</value>
  </property>

  <property>
    <name>dfs.datanode.balance.bandwidthPerSec</name>
    <value>41943040</value>
  </property>

  <property>
    <name>dfs.namenode.safemode.threshold-pct</name>
    <value>0.90</value>
  </property>

  <property>
    <name>dfs.namenode.heartbeat.recheck-interval</name>
    <!-- 60 seconds -->
    <value>60000</value>
  </property>

  <property>
    <name>dfs.datanode.handler.count</name>
    <value>10</value>
  </property>

  <property>
    <name>dfs.namenode.handler.count</name>
    <value>20</value>
  </property>

  <property>
    <name>dfs.image.compress</name>
    <value>true</value>
  </property>

  <property>
    <name>dfs.image.compression.codec</name>
    <value>org.apache.hadoop.io.compress.SnappyCodec</value>
  </property>

  <property>
    <name>dfs.namenode.invalidate.work.pct.per.iteration</name>
    <value>0.35f</value>
  </property>

  <property>
    <name>dfs.namenode.replication.work.multiplier.per.iteration</name>
    <value>4</value>
  </property>

  <!-- This property allows us to use IP's directly for communication instead of hostnames. -->
  <property>
    <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
    <value>false</value>
  </property>

  <property>
    <name>dfs.client.read.shortcircuit</name>
    <value>true</value>
  </property>

  <property>
    <name>dfs.client.read.shortcircuit.streams.cache.size</name>
    <value>1000</value>
  </property>

  <property>
    <name>dfs.client.read.shortcircuit.streams.cache.size.expiry.ms</name>
    <value>1000</value>
  </property>

  <!-- This property needs to be consistent with mesos.hdfs.secondary.data.dir -->
  <property>
    <name>dfs.domain.socket.path</name>
    <value>/var/run/hadoop-hdfs/dn._PORT</value>
  </property>
</configuration>

dfs.namenode.rpc-address.hdfs.nn1, dfs.namenode.http-address.hdfs.nn, dfs.namenode.rpc-address.hdfs.nn2 & dfs.namenode.http-address.hdfs.nn2 were not resolved correctly. Checking out issue-124, I just found:

I would recommend not touching the hdfs-site.xml file and making all of your modifications to the mesos-site.xml file.

Or should I manually touch the hdfs-site.xml?

a2045z avatar Aug 24 '15 12:08 a2045z

Even if I manually edited the hdfs-site.xml as

  <property>
    <name>dfs.namenode.rpc-address.${frameworkName}.nn1</name>
    <value>s1:50071</value>
  </property>

  <property>
    <name>dfs.namenode.http-address.${frameworkName}.nn1</name>
    <value>s1:50070</value>
  </property>

  <property>
    <name>dfs.namenode.rpc-address.${frameworkName}.nn2</name>
    <value>s2:50071</value>
  </property>

  <property>
    <name>dfs.namenode.http-address.${frameworkName}.nn2</name>
    <value>s2:50070</value>
  </property>

The scheduler log still get stuck:


12:59:52.146 [Thread-76] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Received 4 offers
12:59:52.147 [Thread-76] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:59:52.148 [Thread-76] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:59:52.149 [Thread-76] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:59:52.150 [Thread-76] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:59:52.151 [Thread-76] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:59:52.153 [Thread-76] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
12:59:52.154 [Thread-76] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - []
12:59:52.155 [Thread-76] INFO  o.a.m.hdfs.scheduler.HdfsScheduler - Already running 3 journalnodes
...

All 3 JN tasks were frozen at the STAGING state.

a2045z avatar Aug 24 '15 13:08 a2045z

Hi there, @a2045z !

Hm, you shouldn't have to edit *-site.xml with too many custom settings. It should work as is except you would want to properly point to your mesos.hdfs.zkfc.ha.zookeeper.quorum in mesos-site.xml so that you would be pointing to the appropriate zookeeper. Before building, you should update the mesos-site.xml in your conf directory.

Also, as far as verifying that the hdfs-site.xml is correct, you would have to check your sandbox under etc/hadoop to see the populated hdfs-site.xml. It will continue to be updated as more nodes launch on your cluster.

My best guess for the NN's and DN's not launching are a lack of resources. You would want to either scale down your resources in your mesos-site.xml file (which means HDFS will hold less data) before launching or increase your instance sizes.

elingg avatar Aug 25 '15 23:08 elingg