cloudera-deploy icon indicating copy to clipboard operation
cloudera-deploy copied to clipboard

[Feature Req] Tool/Script to generate "cluster.yml" configs from existing exported CDP cluster template json files

Open lhoss opened this issue 4 years ago • 1 comments

Not sure I'm missing a simpler strategy, on howto prepare a new "cluster.yml" file required by this playbook (to be configured in the definition_path), from an exportable CDP (7.1.x / private-cloud) cluster ? Would be great to get input from Cloudera guys :) Alternatively, or in addition, it would be highly useful/helpful if much more advanced "cluster.yml" example would be added to the repo (for ex. to deploy a 3-master node HA cluster, as this was nicely done in the former HDP repo: https://github.com/hortonworks/ansible-hortonworks/blob/master/playbooks/group_vars/example-hdp-ha-3-masters-with-ranger-atlas )

Once I learn here in the community, there's indeed no such existing script .. I'm happy to write&contribute myself something

Minimal features:

  • create the mapping of the contained services to the host-groups
  • create the mapping of all the found "configs" elements to the key/value pairs in the "cluster.yml" service's "dict" element
    • (later) nice to have: an option to skip any config values which are/were just defaults from the beginning
  • many other things .. that are required to make it useable/work?
  • handling these "refName" values in the sourc template json

Script Input / Output examples

  • Input file, just small extract (from exported cluster template)
    • Can give more infos later howto that for people not familiar.
{
  "cdhVersion" : "7.1.4",
  "displayName" : "Basic Cluster",
  "cmVersion" : "7.1.4",
  "repositories" : [ ... ],
  "products" : [ {
    "version" : "7.1.4-1.cdh7.1.4.p0.6300266",
    "product" : "CDH"
  } ],
  "services" : [ {
    "refName" : "zookeeper",
    "serviceType" : "ZOOKEEPER",
    "serviceConfigs" : [ {
      "name" : "zookeeper_datadir_autocreate",
      "value" : "true"
    } ],
    "roleConfigGroups" : [ {
      "refName" : "zookeeper-SERVER-BASE",
      "roleType" : "SERVER",
      "configs" : [ {
        "name" : "zk_server_log_dir",
        "value" : "/var/log/zookeeper"
      }, {
        "name" : "dataDir",
        "variable" : "zookeeper-SERVER-BASE-dataDir"
      }, {
        "name" : "dataLogDir",
        "variable" : "zookeeper-SERVER-BASE-dataLogDir"
      } ],
      "base" : true
    } ]
  }, 
...

  } ],
  "hostTemplates" : [ {
    "refName" : "HostTemplate-0-from-eval-cdp-public[1-3].internal.cloudapp.net",
    "cardinality" : 3,
    "roleConfigGroupsRefNames" : [ "hdfs-DATANODE-BASE", "spark_on_yarn-GATEWAY-BASE", "yarn-NODEMANAGER-BASE" ]
  }, {
    "refName" : "HostTemplate-1-from-eval-cdp-public0.internal.cloudapp.net",
    "cardinality" : 1,
    "roleConfigGroupsRefNames" : [ "hdfs-NAMENODE-BASE", "hdfs-SECONDARYNAMENODE-BASE", "spark_on_yarn-GATEWAY-BASE", "spark_on_yarn-SPARK_YARN_HISTORY_SERVER-BASE", "yarn-JOBHISTORY-BASE", "yarn-RESOURCEMANAGER-BASE", "zookeeper-SERVER-BASE" ]
  } ],
...

Output file, following the format of cluster.yml, for ex: roles/cloudera_deploy/defaults/basic_cluster.yml

clusters:
  - name: Basic Cluster
    services: [HDFS, YARN, ZOOKEEPER]
    repositories:
      - https://archive.cloudera.com/cdh7/7.1.4.0/parcels/
    configs:
      ZOOKEEPER:
        SERVICEWIDE:
          zookeeper_datadir_autocreate: true
          zk_server_log_dir": "/var/log/zookeeper"
     
      HDFS:
        DATANODE:
          dfs_data_dir_list: /dfs/dn
        NAMENODE:
          dfs_name_dir_list: /dfs/nn
...
    host_templates:
      Master1:
        HDFS: [NAMENODE, SECONDARYNAMENODE]
        YARN: [RESOURCEMANAGER, JOBHISTORY]
        ZOOKEEPER: [SERVER]
      Workers:
        HDFS: [DATANODE]
        YARN: [NODEMANAGER]

lhoss avatar Jun 11 '21 11:06 lhoss

I definitely agree this is a useful thing to have. In theory it shouldn't be too hard to do a naive mapping from an exported cluster template into a cluster.yml, however the things that might be tricky are:

  • Not all running clusters have host templates, so you may need to re-engineer those
  • Any configs that are created by the overlays (tls, kerberos, oom/heapdump-settings, logdir settings, etc) would be hard to re-overlay.
  • We don't currently support custom role groups, so things could fall down slightly there.

But if you fancy a stab I'd be happy to review and contribute. It could probably work quite nicely in j2 and then it's potentially possible to operate a "clone-a-cluster" type feature if we're already integrated with Ansible?

tmgstevens avatar Jun 14 '21 08:06 tmgstevens