[feature] Add vSphere Infrastructure Provider
Problem Description
Currently, Omni supports various infrastructure providers (e.g., Equinix Metal, AWS), but lacks native integration with VMware vSphere. Many organizations run Kubernetes clusters on-premises using vSphere as their primary virtualization platform. Managing the underlying virtual machine lifecycle (provisioning, scaling, deletion) for Talos nodes within vSphere currently requires separate tooling or manual processes outside of Omni.
Solution
We propose adding a new infrastructure provider specifically for VMware vSphere. This provider should integrate with the vSphere API (via credentials provided in Omni configuration) to manage the lifecycle of virtual machines intended as Talos cluster nodes.
- Node Provisioning: Create new virtual machines based on a specified template (e.g., a Talos OVA or a custom VM template) within a target vSphere cluster/datacenter/resource pool/folder. This should include configuring CPU, memory, disk, network settings, and potentially passing cloud-init or other configuration data (like the Talos machine configuration).
- Node Scaling:
- Scale Up: Provision additional VM nodes and integrate them into the target Talos cluster managed by Omni.
- Scale Down: Gracefully remove nodes from the Talos cluster and subsequently delete the corresponding virtual machines from vSphere.
- Node Discovery/Inventory (Optional but helpful): Ability to list or recognize VMs managed by Omni within the vSphere environment.
- Configuration: Allow users to specify vSphere connection details (vCenter URL, credentials, insecure skip verify), target environment (datacenter, cluster, resource pool, datastore, network port group/dvSwitch), and VM template details.
Alternative Solutions
Manually managing VMs in vSphere and then bootstrapping Talos/Omni. This lacks the automation and integrated scaling capabilities desired. Using Terraform with the vSphere provider alongside Omni. While possible, this creates a separate workflow layer; native integration within Omni would be more streamlined. Additional context Integrating vSphere would significantly broaden Omni's applicability for on-premises Kubernetes deployments using Talos Linux. It aligns with Omni's goal of simplifying cluster lifecycle management across different infrastructures. The popular HashiCorp Terraform vSphere provider (https://registry.terraform.io/providers/hashicorp/vsphere/latest/docs) could serve as a reference for necessary configuration parameters and API interactions.
Notes
Thank you for the detailed suggestion. It is certainly doable and would be great to have, but is not something we are planning to do at the moment.
We've had the request for a vSphere Omni Provider frequently, but one of the questions I keep asking is are you planning to stick with vSphere? Almost every company I've talked to plans to move off of vSphere over the next 3-5 years as the costs have increased and competition has matured.
Do you know if you have plans to stick with vSphere long term as your primary on-prem VM infrastructure?
We've had the request for a vSphere Omni Provider frequently, but one of the questions I keep asking is are you planning to stick with vSphere? Almost every company I've talked to plans to move off of vSphere over the next 3-5 years as the costs have increased and competition has matured.
Do you know if you have plans to stick with vSphere long term as your primary on-prem VM infrastructure?
At the current moment we are planning to stick with vSphere.
+1
we're also on vSphere for the foreseeable future
Broadcom have basically killed Vsphere. They've kept the details pretty quiet, but smaller players can no longer buy licenses. Small being relative, we have around 1k VMs
We have an alpha version of the vSphere infrastructure provider available here. We're still working on adding docs and cutting a release but please give it a test and and open issues on that repo for bugs and feature requests.
https://github.com/siderolabs/omni-infra-provider-vsphere