eks-anywhere
eks-anywhere copied to clipboard
Add Analyzer for Bad vSphere Permissions
What would you like to be added:
Add an analyzer that detects when cluster creation failed due to incorrect vSphere VM cloning permissions.
Why is this needed:
When provisioning a new management cluster, if your vSphere permissions are set incorrectly such that EKS-A cannot clone a VM for the control plane it is difficult to determine the source of the problem.
The CLI hangs for 1hr+ and then fails with little explanation.
Improving the messaging around this failure mode would improve the product UX.
Right now, to see where the failure is you need to look up this object's definition:
k --kubeconfig mgmt-3/generated/mgmt-3.kind.kubeconfig get vspheremachines.infrastructure.cluster.x-k8s.io.mgmt-3-control-plane-template-1659617090412-9qsck -n eksa-system
The object definition will look as follows. Notice the event failure at the bottom.
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachine
metadata:
annotations:
cluster.x-k8s.io/cloned-from-groupkind: VSphereMachineTemplate.infrastructure.cluster.x-k8s.io
cluster.x-k8s.io/cloned-from-name: mgmt-3-control-plane-template-1659617090412
creationTimestamp: "2022-08-04T12:44:53Z"
finalizers:
- vspheremachine.infrastructure.cluster.x-k8s.io
generation: 1
labels:
cluster.x-k8s.io/cluster-name: mgmt-3
cluster.x-k8s.io/control-plane: ""
name: mgmt-3-control-plane-template-1659617090412-9qsck
namespace: eksa-system
ownerReferences:
- apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: mgmt-3
uid: 1dee2b12-e8d2-414c-aac0-3c179de40253
- apiVersion: cluster.x-k8s.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: Machine
name: mgmt-3-dgtdf
uid: d9c12533-8d5d-4108-9d2b-78c5bae7ffce
- apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereCluster
name: mgmt-3
uid: 330c5c94-a082-453d-aaa9-0488ccbc1b1a
resourceVersion: "1597"
uid: b83d3d37-2e42-4a31-a6f7-11392e504a90
spec:
cloneMode: linkedClone
datacenter: Datacenter
datastore: /Datacenter/datastore/datastore1
diskGiB: 25
folder: /Datacenter/vm/jwmeier/permissiontest
memoryMiB: 2048
network:
devices:
- dhcp4: true
networkName: /Datacenter/network/VM Network
numCPUs: 2
resourcePool: /Datacenter/host/Cluster-01/Resources/TestResourcePool
server: ********
template: /Datacenter/vm/Templates/bottlerocket-v1.22.10-kubernetes-1-22-eks-9-amd64-f18b278
status:
conditions:
- lastTransitionTime: "2022-08-04T12:44:57Z"
message: 'error trigging clone op for machine infrastructure.cluster.x-k8s.io/v1beta1,
Kind=VSphereVM eksa-system/mgmt-3-dgtdf: ServerFaultCode: Permission to perform
this operation was denied.'
reason: CloningFailed
severity: Warning
status: "False"
type: Ready
- lastTransitionTime: "2022-08-04T12:44:57Z"
message: 'error trigging clone op for machine infrastructure.cluster.x-k8s.io/v1beta1,
Kind=VSphereVM eksa-system/mgmt-3-dgtdf: ServerFaultCode: Permission to perform
this operation was denied.'
reason: CloningFailed
severity: Warning
status: "False"
type: VMProvisioned
This isn't an urgent need because we are implementing vsphere priv validation and user configuration, but I still want to document the failure mode and note that an analyzer may be useful.