cluster-api-provider-vsphere
cluster-api-provider-vsphere copied to clipboard
Remove dependency for API access of Clusters
/kind feature
Describe the solution you'd like Currently all clusters created by CAPV have the vsphere-cloud-controller-manager installed, even if i strip out all other CSI components. Without vsphere-cloud-controller-manager nodes dont get marked as ready thus a 2nd or 3rd master node never get provisioned for example.
From my understanding vsphere-cloud-controller-manager only does the following things in a non-CSI cluster:
- removing the node.cloudprovider.kubernetes.io/uninitialized taint
- adding the beta.kubernetes.io/instance-type label
- adding the providerID: vsphere://UUID
All of which, even the providerID ( https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html#update-all-node-providerid-fields ), could be done outside the cluster, thus making Clusters possible that dont have access to the api of the cluster they are running on, thus improving security.
Environment:
- Cluster-api-provider-vsphere version:
- Kubernetes version: (use
kubectl version): - OS (e.g. from
/etc/os-release):
im currently using thsi script to manually trigger the correct data being present
export GOVC_USERNAME='user'
export GOVC_INSECURE=1
export GOVC_PASSWORD='pw'
export GOVC_URL='server'
DATACENTER='DC'
FOLDER=Folder'
# In my case I'm using a prefix for the VM's, so grep'ing is necessary.
# You can remove it if the folder you are using only contains the machines you need.
VM_PREFIX='smops'
IFS=$'\n'
for vm in $(govc ls "/$DATACENTER/vm/$FOLDER" | grep $VM_PREFIX); do
MACHINE_INFO=$(govc vm.info -json -dc=$DATACENTER -vm.ipath="/$vm" -e=true)
# My VMs are created on vmware with upper case names, so I need to edit the names with awk
VM_NAME=$(jq -r ' .VirtualMachines[] | .Name' <<< $MACHINE_INFO | awk '{print tolower($0)}')
# UUIDs come in lowercase, upper case then
VM_UUID=$( jq -r ' .VirtualMachines[] | .Config.Uuid' <<< $MACHINE_INFO | awk '{print toupper($0)}')
echo "Patching $VM_NAME with UUID:$VM_UUID"
# This is done using dry-run to avoid possible mistakes, remove when you are confident you got everything right.
kubectl patch node $VM_NAME -p "{\"spec\":{\"providerID\":\"vsphere://$VM_UUID\"}}"
kubectl taint nodes $VM_NAME node.cloudprovider.kubernetes.io/uninitialized-
done
@yastij have you thought about moving the deployment of CCM/CPI/etc to a ClusterResourceSet, once they're available? That should solve this request (assuming one has a separate controller to set the provider ID and remove the taint).
@ncdc - that can be a solution. I'm also thinking about what it would take to run the CPI as part of the management cluster
We are considering moving the external cloud provider components to a ClusterResourceSet once available for CAPZ.
"Without vsphere-cloud-controller-manager nodes dont get marked as ready thus a 2nd or 3rd master node never get provisioned for example." - we are running into exactly this today but for now relying on the user to manually apply the external cloud provider yaml after the first control plane is up https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/master/docs/topics/external-cloud-provider.md.
Just verfified that running the CCM from the management cluster works
So, here is my yaml-file that i used to run the ccm from the mgmt cluster: external-ccm-san.yaml.txt One thing i havent gotten to work with the this PoC: grabbing the vsphere credentials from a secret insetad out of a configmap. That always gave me this errors. Directly in the config everything works fione tho.
W0623 17:57:12.514429 1 credentialmanager.go:85] Cannot get secret vsphere-cpi in namespace bremen. error: "secret \"vsphere-cpi\" not found"
E0623 17:57:12.514437 1 credentialmanager.go:54] updateCredentialsMapK8s failed. err=secret "vsphere-cpi" not found
W0623 17:57:12.514443 1 credentialmanager.go:60] secret "vsphere-cpi" not found in namespace "bremen"
E0623 17:57:12.514448 1 credentialmanager.go:75] credentials not found for server vcenter1.sce-dcn.net
Suspect it's not using local object reference to grab the secret and ended up in the wrong namespace. Possibly a bug for the vsphere provider.
Might be. Might also be that ive botched the rolebindings. The important part is, that putting that in the mgmt cluster works without additional changes. And we can still just use a secret for the config in general, not just for the passowrd and username to mitigate it somewhat.
@MaxRink I'm reporting here the TL;DR from the slack thread, please correct me if there is something incomplete or wrong.
- CAPV allows to opt-out from CSI and CCM installation via API config
- Without CCM, nodes do not get ready, so we are exploring the installation of CCM in the management cluster
- installation of CCM in the management cluster happens via the git-ops tooling, in sync with the creation of the workload cluster (1 instance of CCM for each workload cluster)
- the only problem pending is how to pass vSphere credential to CCM instances, due to a bug in using secrets. While this gets fixes in CCM, a possible workaround is to set env variables from the secret, and then let CCM use credentials from env variables, but this should be tested
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
My understanding of the current status:
- 👍 it's possible to stop installing CPI and CSI, by removing the ClusterResourceSet
- 👍 it's possible to skip the removing the node.cloudprovider.kubernetes.io/uninitialized taint step, by setting
kubeletExtraArgs.cloud-providerto an empty string - 👍 adding the beta.kubernetes.io/instance-type label, is optional
- 👎 adding the providerID: vsphere://UUID is still needed
For this last item:
-
I'm currently using the following hack:
old_IFS=$IFS IFS=$'\n' for machine in $(KUBECONFIG=/var/opt/kubitus/config/mgmt/kubitus-bootstrap.kubeconfig kubectl get machine --output=custom-columns=NAME:.metadata.name,PROVIDERID:.spec.providerID,PHASE:.status.phase --no-headers=true); do machine_name="$(echo $machine | awk '{print $1}')" provider_id="$(echo $machine | awk '{print $2}')" phase="$(echo $machine | awk '{print $3}')" [ "$phase" = "Provisioned" ] || continue KUBECONFIG=/var/opt/kubitus/config/mgmt/mgmt.kubeconfig \ kubectl patch node "$machine_name" -p '{"spec":{"providerID":"'$provider_id'"}}' done IFS=$old_IFS -
I've created an issue in cluster-api to match machines and nodes with name, this is probably not the correct place, so
I see the following way forward:
- fix this in cluster-api-provider-vsphere, with a new field (proposal:
vspheremachine.spec.providerIDFromSystemUUID), buildingProviderIDfromnode.Status.NodeInfo.SystemUUIDusing the same function as in CPI - fix this in cloud-provider-vsphere, by adding a mode without access to vSphere
- fix this with a new "tiny" provider
See also somewhat related issue in CSI: https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/1742
EDIT 2022-06-01: Added third way forward
@srm09 @MaxRink @yastij WDYT about my proposed ways forward? I can propose a PR for solution 1 (solution 2 is harder for me, and solution 3 means a new repo probably).