cloudera-deploy
cloudera-deploy copied to clipboard
[Feature Req] Tool/Script to generate "cluster.yml" configs from existing exported CDP cluster template json files
Not sure I'm missing a simpler strategy, on howto prepare a new "cluster.yml" file required by this playbook (to be configured in the definition_path), from an exportable CDP (7.1.x / private-cloud) cluster ? Would be great to get input from Cloudera guys :) Alternatively, or in addition, it would be highly useful/helpful if much more advanced "cluster.yml" example would be added to the repo (for ex. to deploy a 3-master node HA cluster, as this was nicely done in the former HDP repo: https://github.com/hortonworks/ansible-hortonworks/blob/master/playbooks/group_vars/example-hdp-ha-3-masters-with-ranger-atlas )
Once I learn here in the community, there's indeed no such existing script .. I'm happy to write&contribute myself something
Minimal features:
- create the mapping of the contained services to the host-groups
- create the mapping of all the found "configs" elements to the key/value pairs in the "cluster.yml" service's "dict" element
- (later) nice to have: an option to skip any config values which are/were just defaults from the beginning
- many other things .. that are required to make it useable/work?
- handling these "refName" values in the sourc template json
Script Input / Output examples
- Input file, just small extract (from exported cluster template)
- Can give more infos later howto that for people not familiar.
{
"cdhVersion" : "7.1.4",
"displayName" : "Basic Cluster",
"cmVersion" : "7.1.4",
"repositories" : [ ... ],
"products" : [ {
"version" : "7.1.4-1.cdh7.1.4.p0.6300266",
"product" : "CDH"
} ],
"services" : [ {
"refName" : "zookeeper",
"serviceType" : "ZOOKEEPER",
"serviceConfigs" : [ {
"name" : "zookeeper_datadir_autocreate",
"value" : "true"
} ],
"roleConfigGroups" : [ {
"refName" : "zookeeper-SERVER-BASE",
"roleType" : "SERVER",
"configs" : [ {
"name" : "zk_server_log_dir",
"value" : "/var/log/zookeeper"
}, {
"name" : "dataDir",
"variable" : "zookeeper-SERVER-BASE-dataDir"
}, {
"name" : "dataLogDir",
"variable" : "zookeeper-SERVER-BASE-dataLogDir"
} ],
"base" : true
} ]
},
...
} ],
"hostTemplates" : [ {
"refName" : "HostTemplate-0-from-eval-cdp-public[1-3].internal.cloudapp.net",
"cardinality" : 3,
"roleConfigGroupsRefNames" : [ "hdfs-DATANODE-BASE", "spark_on_yarn-GATEWAY-BASE", "yarn-NODEMANAGER-BASE" ]
}, {
"refName" : "HostTemplate-1-from-eval-cdp-public0.internal.cloudapp.net",
"cardinality" : 1,
"roleConfigGroupsRefNames" : [ "hdfs-NAMENODE-BASE", "hdfs-SECONDARYNAMENODE-BASE", "spark_on_yarn-GATEWAY-BASE", "spark_on_yarn-SPARK_YARN_HISTORY_SERVER-BASE", "yarn-JOBHISTORY-BASE", "yarn-RESOURCEMANAGER-BASE", "zookeeper-SERVER-BASE" ]
} ],
...
Output file, following the format of cluster.yml, for ex: roles/cloudera_deploy/defaults/basic_cluster.yml
clusters:
- name: Basic Cluster
services: [HDFS, YARN, ZOOKEEPER]
repositories:
- https://archive.cloudera.com/cdh7/7.1.4.0/parcels/
configs:
ZOOKEEPER:
SERVICEWIDE:
zookeeper_datadir_autocreate: true
zk_server_log_dir": "/var/log/zookeeper"
HDFS:
DATANODE:
dfs_data_dir_list: /dfs/dn
NAMENODE:
dfs_name_dir_list: /dfs/nn
...
host_templates:
Master1:
HDFS: [NAMENODE, SECONDARYNAMENODE]
YARN: [RESOURCEMANAGER, JOBHISTORY]
ZOOKEEPER: [SERVER]
Workers:
HDFS: [DATANODE]
YARN: [NODEMANAGER]
I definitely agree this is a useful thing to have. In theory it shouldn't be too hard to do a naive mapping from an exported cluster template into a cluster.yml, however the things that might be tricky are:
- Not all running clusters have host templates, so you may need to re-engineer those
- Any configs that are created by the overlays (tls, kerberos, oom/heapdump-settings, logdir settings, etc) would be hard to re-overlay.
- We don't currently support custom role groups, so things could fall down slightly there.
But if you fancy a stab I'd be happy to review and contribute. It could probably work quite nicely in j2 and then it's potentially possible to operate a "clone-a-cluster" type feature if we're already integrated with Ansible?