Datanode hdfs-site.xml not updated correctly
While running the hdfs framework on our mesos cluster, we've noticed it stops working frequently. We run on AWS, and have a 10 node mesos cluster, which runs 2 name nodes, 3 journal nodes (2 co-located), and 7 datanodes.
Datanodes drop off continuously from the file system. We'd have too many logs to post here to show evidence of this issue, but here's what we have noticed:
- Datanodes drop off. Upon checking, the datanodes that are offline ARE running, but their hdfs-site.xml has an IP of a slave that is now gone from mesos altogether (it was terminated by AWS)
- Checking the framework's config server, the hdfs-site.xml returned by it is correct and has the appropriate name node configuration.
- Checking the stdout/stderr of the datanodes that have dropped off, there is a claim that the configuration is being refreshed from the config server. But checking /opt/mesosphere/hdfs/etc/hdfs-site.xml or the sandbox for the datanode reveals the file is NOT being updated.
Tried killing the data node process, but that didn't seem to work, with the hdfs-site.xml still out of date. Killing the executor process and the data node process at the same time, then the file started getting updated correctly, and the datanode(s) came up.
We also observed the same issue with the journal nodes and name nodes. With the name nodes, there's an added issue. If both name nodes are killed at the same time, they can never come up, since the configs are updated incorrectly. For example, in that situation, a backup name node doesn't exist, but the process still puts out an hdfs-site.xml with the configuration for the secondary name node, which just contains :port. Then, both primary and secondary name nodes fail to come up because of an invalid pdfs-site.xml.
Adding a bit more here. It appears that config server IP & port available in executorInfo and is not updated once hdfs scheduler restarts on different node. Only way to recover is to restart Executor so that it gets correct host:port for config server. It may be better to store config server host:port information in zookeeper or pass it in reloadConfigsOnAllRunningTasks call
@yoeduardoj and @bhallakapil good catch! hdfs-mesos isn't designed well in this area. thoughts:
- we don't support the killing of both namenodes at this point.
- as you discovered, the "refresh" of the hdfs-site.xml is by the executor and only happens at launch. we need to solve this... likely by sending a frameworkMessage to refresh configs
- also as you discovered, if the scheduler IP or PORT changes for the scheduler executors will not be able to pull a config change that is set in the executorInfo at launch. We need to fix this. we have a solution on dcos. fixing this on vanilla mesos would likely mean a framework message or zk entry.
I have created PR as https://github.com/mesosphere/hdfs/pull/252 to address the configuration reload issue. Please review and merge.