cluster-api-provider-vsphere icon indicating copy to clipboard operation
cluster-api-provider-vsphere copied to clipboard

Remove dependency for API access of Clusters

Open MaxRink opened this issue 5 years ago • 17 comments
trafficstars

/kind feature

Describe the solution you'd like Currently all clusters created by CAPV have the vsphere-cloud-controller-manager installed, even if i strip out all other CSI components. Without vsphere-cloud-controller-manager nodes dont get marked as ready thus a 2nd or 3rd master node never get provisioned for example.

From my understanding vsphere-cloud-controller-manager only does the following things in a non-CSI cluster:

  • removing the node.cloudprovider.kubernetes.io/uninitialized taint
  • adding the beta.kubernetes.io/instance-type label
  • adding the providerID: vsphere://UUID

All of which, even the providerID ( https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html#update-all-node-providerid-fields ), could be done outside the cluster, thus making Clusters possible that dont have access to the api of the cluster they are running on, thus improving security.

Environment:

  • Cluster-api-provider-vsphere version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

MaxRink avatar Jun 03 '20 10:06 MaxRink

im currently using thsi script to manually trigger the correct data being present


export GOVC_USERNAME='user'
export GOVC_INSECURE=1
export GOVC_PASSWORD='pw'
export GOVC_URL='server'
DATACENTER='DC'
FOLDER=Folder'

# In my case I'm using a prefix for the VM's, so grep'ing is necessary.
# You can remove it if the folder you are using only contains the machines you need.
VM_PREFIX='smops'
IFS=$'\n'
for vm in $(govc ls "/$DATACENTER/vm/$FOLDER" | grep $VM_PREFIX); do
  MACHINE_INFO=$(govc vm.info -json -dc=$DATACENTER -vm.ipath="/$vm" -e=true)
  # My VMs are created on vmware with upper case names, so I need to edit the names with awk
  VM_NAME=$(jq -r ' .VirtualMachines[] | .Name' <<< $MACHINE_INFO | awk '{print tolower($0)}')
  # UUIDs come in lowercase, upper case then
  VM_UUID=$( jq -r ' .VirtualMachines[] | .Config.Uuid' <<< $MACHINE_INFO | awk '{print toupper($0)}')
  echo "Patching $VM_NAME with UUID:$VM_UUID"
  # This is done using dry-run to avoid possible mistakes, remove when you are confident you got everything right.
  kubectl patch node $VM_NAME -p "{\"spec\":{\"providerID\":\"vsphere://$VM_UUID\"}}"
  kubectl taint nodes $VM_NAME node.cloudprovider.kubernetes.io/uninitialized-
done

MaxRink avatar Jun 03 '20 15:06 MaxRink

@yastij have you thought about moving the deployment of CCM/CPI/etc to a ClusterResourceSet, once they're available? That should solve this request (assuming one has a separate controller to set the provider ID and remove the taint).

ncdc avatar Jun 03 '20 15:06 ncdc

@ncdc - that can be a solution. I'm also thinking about what it would take to run the CPI as part of the management cluster

yastij avatar Jun 03 '20 16:06 yastij

We are considering moving the external cloud provider components to a ClusterResourceSet once available for CAPZ.

"Without vsphere-cloud-controller-manager nodes dont get marked as ready thus a 2nd or 3rd master node never get provisioned for example." - we are running into exactly this today but for now relying on the user to manually apply the external cloud provider yaml after the first control plane is up https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/master/docs/topics/external-cloud-provider.md.

CecileRobertMichon avatar Jun 10 '20 17:06 CecileRobertMichon

Just verfified that running the CCM from the management cluster works

MaxRink avatar Jun 18 '20 21:06 MaxRink

So, here is my yaml-file that i used to run the ccm from the mgmt cluster: external-ccm-san.yaml.txt One thing i havent gotten to work with the this PoC: grabbing the vsphere credentials from a secret insetad out of a configmap. That always gave me this errors. Directly in the config everything works fione tho.

W0623 17:57:12.514429       1 credentialmanager.go:85] Cannot get secret vsphere-cpi in namespace bremen. error: "secret \"vsphere-cpi\" not found"
E0623 17:57:12.514437       1 credentialmanager.go:54] updateCredentialsMapK8s failed. err=secret "vsphere-cpi" not found
W0623 17:57:12.514443       1 credentialmanager.go:60] secret "vsphere-cpi" not found in namespace "bremen"
E0623 17:57:12.514448       1 credentialmanager.go:75] credentials not found for server vcenter1.sce-dcn.net

MaxRink avatar Jun 23 '20 18:06 MaxRink

Suspect it's not using local object reference to grab the secret and ended up in the wrong namespace. Possibly a bug for the vsphere provider.

randomvariable avatar Jun 23 '20 18:06 randomvariable

Might be. Might also be that ive botched the rolebindings. The important part is, that putting that in the mgmt cluster works without additional changes. And we can still just use a secret for the config in general, not just for the passowrd and username to mitigate it somewhat.

MaxRink avatar Jun 23 '20 19:06 MaxRink

@MaxRink I'm reporting here the TL;DR from the slack thread, please correct me if there is something incomplete or wrong.

  • CAPV allows to opt-out from CSI and CCM installation via API config
  • Without CCM, nodes do not get ready, so we are exploring the installation of CCM in the management cluster
  • installation of CCM in the management cluster happens via the git-ops tooling, in sync with the creation of the workload cluster (1 instance of CCM for each workload cluster)
  • the only problem pending is how to pass vSphere credential to CCM instances, due to a bug in using secrets. While this gets fixes in CCM, a possible workaround is to set env variables from the secret, and then let CCM use credentials from env variables, but this should be tested

fabriziopandini avatar Aug 19 '20 10:08 fabriziopandini

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Feb 14 '21 12:02 fejta-bot

My understanding of the current status:

For this last item:

  • I'm currently using the following hack:

    old_IFS=$IFS
    IFS=$'\n'
    for machine in $(KUBECONFIG=/var/opt/kubitus/config/mgmt/kubitus-bootstrap.kubeconfig kubectl get machine --output=custom-columns=NAME:.metadata.name,PROVIDERID:.spec.providerID,PHASE:.status.phase --no-headers=true); do
      machine_name="$(echo $machine | awk '{print $1}')"
      provider_id="$(echo $machine | awk '{print $2}')"
      phase="$(echo $machine | awk '{print $3}')"
      [ "$phase" = "Provisioned" ] || continue
      KUBECONFIG=/var/opt/kubitus/config/mgmt/mgmt.kubeconfig \
        kubectl patch node "$machine_name" -p '{"spec":{"providerID":"'$provider_id'"}}'
    done
    IFS=$old_IFS
    
  • I've created an issue in cluster-api to match machines and nodes with name, this is probably not the correct place, so

I see the following way forward:

  • fix this in cluster-api-provider-vsphere, with a new field (proposal: vspheremachine.spec.providerIDFromSystemUUID), building ProviderID from node.Status.NodeInfo.SystemUUID using the same function as in CPI
  • fix this in cloud-provider-vsphere, by adding a mode without access to vSphere
  • fix this with a new "tiny" provider

See also somewhat related issue in CSI: https://github.com/kubernetes-sigs/vsphere-csi-driver/issues/1742

EDIT 2022-06-01: Added third way forward

sathieu avatar May 11 '22 14:05 sathieu

@srm09 @MaxRink @yastij WDYT about my proposed ways forward? I can propose a PR for solution 1 (solution 2 is harder for me, and solution 3 means a new repo probably).

sathieu avatar Jun 01 '22 05:06 sathieu