hdfs-deprecated
hdfs-deprecated copied to clipboard
Design Discussion of Dead Node Timeout
For data nodes, we do not need to recover them in the case that there is a sufficient number of data nodes.
There are a couple of options here regarding recovery of Name Nodes and Journal Nodes:
- Use a single timestamp for Name Nodes and another timestamp for Journal Nodes. If two name nodes die within the recovery interval, the recovery time period will reset when the second Name Node dies. If more than one Journal Nodes dies within the recovery interval, the recovery time period will reset when the last Journal Node dies. This is also the current implementation.
Pros: Simplicity in the design / Longer period for recovery in case of multiple failures
Cons: Lack of precision. The user may want to have a precise recovery timeout on a per node basis and the user may not want to go over that.
- Use a timestamp per NN and JN. This means that each node has a precise recovery period.
Pros: Precise, recovery is on a per node basis.
Cons: More complex design. Is a timestamp per node necessary in the case of NN's and JN's?
cc @adam-mesos