Elizabeth Lingg comments

Results 77 comments of


                                            Elizabeth Lingg

Error: We need to coloate the namenode with a journalnode and there isno journalnode running on this host.

Sounds awesome @knuckolls. There should be a resource check when launching them to make sure that 2/3 journal nodes have space for the NN's to colocate on them.

Design Discussion of Dead Node Timeout

cc @adam-mesos

Upgrade of HDFS Framework

thanks @adam-mesos, I think upgrading the version of Hadoop would be a specific scenario in the upgrade guide. There's upgrading the framework version and then there's upgrading the version of...

Is it possible to spicify node IP for datanode?

correct @tangzhankun, the answer is no currently. Thanks for bringing up this issue @teamsoo! Constraints in HDFS-Mesos may be quite useful

Handle failure of scheduler over long period of time or repeated failure of scheduler

Zookeeper storage with an in memory cache may be a good solution for this. In fact, refactoring of persistent state to use an in memory cache would be ideal.

Handle failure of scheduler over long period of time or repeated failure of scheduler

hi @tangzhankun, With the current implementation, if a node dies it has 1.5 minutes to recover (this time is configurable). If the scheduler fails over and a node has died...

If a JN/NN/ZK task dies due to a slave being lost, it is never started.

@abhay-agarwal , yes this will be fixed when there is a configurable number a DN's. A workaround is that when the slave dies, a new slave gets spin up and...

Handle failure of scheduler over long period of time or repeated failure of scheduler

Corrected Steps: 1. the scheduler's run() will execute and then callback registered() or reregistered() will be called. 2. driver.reconcileTasks() will be called which will send status updates for all running...

Handle failure of scheduler over long period of time or repeated failure of scheduler

hi @tangzhankun, yes your description of the issue is correct.

Handle failure of scheduler over long period of time or repeated failure of scheduler

Correct, not serious, but we still want to get rid of these bugs as well!