vsphere_rancher_cluster
vsphere_rancher_cluster copied to clipboard
Terraform plan for creating a hardened multi-node RKE2 cluster on VMware vSphere
RKE2 Cluster with vSphere CPI/CSI & kube-vip
Reason for Being
This Terraform plan is for creating a multi-node CIS Benchmarked RKE2 cluster with vSphere CPI/CSI & kube-vip installed and configured. RKE2's NGiNX Ingress Controller is also exposed as a LoadBalancer service to work in concert with kube-vip. Along with those quality-of-life additions, this cluster plan takes the standard RKE2 security posture a couple of steps further by way of installing with CIS 1.23 Profile enabled, using Calico's Wireguard backend for encrypting pod-to-pod communication, & enforcing the use TLS 1.3 across Control Plane components.
There is a lot of HereDoc in the rke_config
section of cluster.tf
so that it's easier to see what's going on - you'll probably want to put this info in a template file to keep the plan a bit neater than what's seen here.
Some operating systems will run containerd within the "systemd" control group and the Kubelet within the "cgroupfs" control group - this plan passes to the Kubelet a --cgroup-driver=systemd
argument to ensure that there will be only a single cgroup manager running - better aligining the cluster with upstream K8s reccomendations ( see: https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers).
Static IP Addressing
Static IPs can be implemented if needed. Firstly, a Network Protocol Profile needs to be created in vSphere. After the profile is created, two parts of this Terraform plan need to be changed: cloud-init
and the rancher2_machine_config_v2
resource in cluster.tf
.
- A script must be added with
write_files
and executed viaruncmd
incloud-init
. This script gathers instance metadata, via vmtools, and then applies it (the below example uses Netplan. Your OS, however, may use something different):
- content: |
#!/bin/bash
vmtoolsd --cmd 'info-get guestinfo.ovfEnv' > /tmp/ovfenv
IPAddress=$(sed -n 's/.*Property oe:key="guestinfo.interface.0.ip.0.address" oe:value="\([^"]*\).*/\1/p' /tmp/ovfenv)
SubnetMask=$(sed -n 's/.*Property oe:key="guestinfo.interface.0.ip.0.netmask" oe:value="\([^"]*\).*/\1/p' /tmp/ovfenv)
Gateway=$(sed -n 's/.*Property oe:key="guestinfo.interface.0.route.0.gateway" oe:value="\([^"]*\).*/\1/p' /tmp/ovfenv)
DNS=$(sed -n 's/.*Property oe:key="guestinfo.dns.servers" oe:value="\([^"]*\).*/\1/p' /tmp/ovfenv)
cat > /etc/netplan/01-netcfg.yaml <<EOF
network:
version: 2
renderer: networkd
ethernets:
ens192:
addresses:
- $IPAddress/24
gateway4: $Gateway
nameservers:
addresses : [$DNS]
EOF
netplan apply
path: /root/netplan.sh
- The below additions need to be made to
rancher2_machine_config_v2
. This example would apply static IPv4 addresses to only thectl_plane
node pool:
vapp_ip_allocation_policy = each.key == "ctl_plane" ? "fixedAllocated" : null
vapp_ip_protocol = each.key == "ctl_plane" ? "IPv4" : null
vapp_property = each.key == "ctl_plane" ? [
"guestinfo.interface.0.ip.0.address=ip:<vSwitch_from_Network_Protocol_Profile>",
"guestinfo.interface.0.ip.0.netmask=$${netmask:<vSwitch_from_Network_Protocol_Profile>}",
"guestinfo.interface.0.route.0.gateway=$${gateway:<vSwitch_from_Network_Protocol_Profile>}",
"guestinfo.dns.servers=$${dns:<vSwitch_from_Network_Protocol_Profile>}",
] : null
vapp_transport = each.key == "ctl_plane" ? "com.vmware.guestInfo" : null
Using static IPs comes with some small caveats:
- In leu of "traditional"
cloud-init
logic to handle OS updates/upgrades & package installs:
package_reboot_if_required: true
package_update: true
package_upgrade: true
packages:
- <insert_awesome_package_name_here>
Scripting would need to be introduced to take care of this later on in the cloud-init
process, if desired (i.e. a write_file
using defer: true
). Since runcmd
happens later in the cloud-init
process, the node would not have an IP available to successfully complete any package*
logic requiring network access.
Environment Prerequisites
-
Functional Rancher Management Server with vSphere Cloud Credential
-
vCenter >= 7.x and credentials with appropriate permissions (see vSphere Permissions section)
-
Virtual Machine Hardware Compatibility at Version >= 15
-
Create the following in the files/ directory:
NAME PURPOSE .rancher-api-url URL for Rancher Management Server .rancher-bearer-token API bearer token generated via Rancher UI .ssh-public-key SSH public key for additional OS user .vsphere-passwd Password associated with vSphere CPI/CSI credential
vSphere Permissions
For required vSphere CPI & CSI account permissions see HERE.
Caveats
- vSphere CSI volumes are RWO only unless using vSAN Datastore
- Using Wireguard as CNI backend comes at a performance penalty (see https://projectcalico.docs.tigera.io/security/encrypt-cluster-pod-traffic)
- kube-vip is configured in L2 mode, so ALL LoadBalancer service traffic goes only to the node that has the VIP assigned, which is not suitable for production
To Run
terraform apply
Tested Versions
SOFTWARE | VERSION | DOCS |
---|---|---|
kube-vip | 0.6.2 | https://kube-vip.io/docs/ |
Rancher Server | 2.7.6 | https://ranchermanager.docs.rancher.com/ |
Rancher Terraform Provider | 3.1.1 | https://registry.terraform.io/providers/rancher/rancher2/latest/docs |
RKE2 | 1.26.8+rke2r1 | https://docs.rke2.io |
Terraform | 1.4.6 | https://www.terraform.io/docs |
vSphere | 8.0.1.00300 | https://docs.vmware.com/en/VMware-vSphere/index.html |