Drive terraform from ansible inventory
Release Notes
Define infrastructure using Ansible inventory variables instead of Terraform variables:
- Infrastructure is defined by new inventory vars
cluster_*andnode_*in anenvironments/<environment>/inventory/cluster.ymlinventory file. See ansible/roles/terraform/README.md for available variables. - Infrastructure is provisioned using a new playbook:
ansible-playbook ansible/provision.yml
New/changed functionality
- Any
node_*-prefixed var can vary per-node, so can be defined using groupvars (allor specific groups) and/or hostvars allowing far more flexibility in node/group definition. - More flexible definition of node names, which may now also use host ranges.
- Support for multiple network interfaces per node.
- Instance hostnames are now a fqdn.
- Default terraform templates add
instance_idvariable to hosts to support operations on specific instances even with multiple identical hostnames. - TODO: what else?
Configuration
-
If creating a new environment with cookiecutter, a file
environments/<environment>/inventory/cluster.ymlwill be created which contains example infrastructure definition. This should be modified as required. -
Defaults are defined in
environments/common/inventory/cluster.yml. Note that this is lower-precedence thanallgroupvars. -
If necessary, the default terraform templates can be also replaced or extended with environment-specific ones. See TODO:
-
[ ] TODO: Add skeleton cluster config file.
Upgrading
Note that this functionality is opt-in; any current terraform in e.g. environments/<environment>/terraform/ will not be replaced unless the ansible/provision.yml playbook is run. To "upgrade" a cluster using the previous Terraform:
- Delete all Terraform templates TODO NOT the state files.
- Copy the skeleton file TODO to
environments/<environment>/inventory/cluster.yml - Modify inventory variables and/or Terraform templates
- Run the
ansible/provision.ymlplaybook and cancel the apply if Terraform says there will be changes. - Repeat steps 3 and 4 until Terraform reports no changes.
Design Notes
- Currently the "inputs" to the cluster are split between TF and ansible variables. This PR drives TF from ansible, so ansible is the "single source of truth".
- This design uses an actual inventory file to define the infrastructure meaning inventory hosts are defined before provisioning. This means:
a. groupvars and hostvars can be used when templating ansible, as opposed to the CaaS approach where only
allgroupvars are available from the default localhost. b. With some tweaks tostackhpc.openhpcrole environment-specific image builds will be possible without having deployed a cluster (as e.g. the slurm control hostname is already known) c. Ansible hostpatterns can be used to define nodenames, e,g.compute-[0:10]. inventory_hostnamesare now a short name (e.g.controlwhich does not contain the cluster name. Actual hostnames are a fqdn including the cluster name. This change is rolled through toansible/roles/etc_hostsandansible/adhoc/rebuild.yml. This maintains the current (pervasive!) assumption that inventory_hostnames are resolvable names.- The
community.general.terraform roleis wrapped to (by default) require user confirmation before making changes to infra.
TODO:
- [ ] Remove the terraform in
environments/skeleton/{{cookiecutter.environment}}/- leaving it in currently to make merges easier.