cluster-api-provider-vsphere icon indicating copy to clipboard operation
cluster-api-provider-vsphere copied to clipboard

Support for NSX-T

Open akutz opened this issue 5 years ago • 12 comments

/kind feature

Describe the solution you'd like CAPV should support NSX-T for:

  • Load balancers for deploying multi-node control plane, HA clusters
    • Related to https://github.com/kubernetes-sigs/cluster-api/issues/1250
    • Ability to configure the NSX-T LB size
  • NSX-T as the CNI provider
    • Ability to configure different NSX-T node and pod IP blocks per cluster
    • Ability to configure different NSX-T T0 routers per cluster

akutz avatar Aug 16 '19 18:08 akutz

I have been giving this some thought before this was opened and I think while possible it will probably require a baseline NSX-T environment to be in place and installed on given vSphere Cluster nodes before CAPV can install and configure NCP components and Kubernetes clusters to support NSX-T. NSX-T in its current incarnation requires the following to install and I am not sure how feasible it would be to do the NSX-T Install as part of a CAPV cluster setup. (Full Install Instructions are here. https://docs.vmware.com/en/VMware-NSX-T-Data-Center/2.4/installation/GUID-3E0C4CEC-D593-4395-84C4-150CD6285963.html)

  1. Installation of Controllers / Managers which are Virtual Machines(Usually 3)

  2. Installation of Edge VMs for Ingress / Egress from NSX-T Logical Networks (2-10)

  3. Preparation of the ESXi Hosts in a given vSphere cluster with the NSX-T Kernel Modules and creation of a Tunnel End Point (TEP) Interface to Encap/Decap traffic which usually requires a reboot.

  4. Networking Infrastructure Requirements / Because of the Encapsulation overhead physical networking switches that will carry NSX-T Geneve Frames physical switches carrying VTEP traffic will need to have Jumbo frames enabled or MTU changed to at least 1600.

  5. Like other SDN solutions user will need to account for IPAM or plan addresses for several networks including Management, VTEP subnet, and Uplink interfaces on the Edge VMs for Ingress and Egress. This does not include the K8s Node and POD networks or the SNAT / VIP subnet for the Load Balancers.

tkrausjr avatar Aug 16 '19 20:08 tkrausjr

Hi, I managed to install ncp as cni in workload cluster. In the next version, ncp doesn't require to install ovs and cni manually. But we still have some challenges here.

  • NCP container image is required on private repository. I used harbor in my lab and needed to install private CA into deployed ubuntu nodes. Is there any way to insert private CA into k8s nodes?

  • The interface name which I want to use for ovs is not deterministic when node has 2 nics.

I can write up how to install ncp once next ncp is ready.

yktsubo avatar Aug 23 '19 07:08 yktsubo

/assign @timothysc Can you assist with some test infra for this?

moshloop avatar Oct 01 '19 18:10 moshloop

/unassign @timothysc
We will have federated CI signal at some point on PRs, but that would be @akutz.

timothysc avatar Oct 01 '19 19:10 timothysc

Hey @yktsubo , Are you saying you were able to setup and configure NCP on K8s as part of Guest Cluster deployment with CAPV or did you just post-install NCP / NSX-T on a CAPV deployed cluster ?

tkrausjr avatar Oct 30 '19 14:10 tkrausjr

Sorry when I posted here, I thought it worked since I could see IP from kubectl get pod -o wide. However the network didn't work properly because no interface was configured on continer namespace.

This deployment was done on workload cluster deployed by CAPV. After workload cluster is deployed, I deployed ncp bootstrap containers on the cluster and tried to make it work. But I couldn't finish installation. I think because ncp doesn't work with containerd yet.

yktsubo avatar Oct 31 '19 12:10 yktsubo

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Jan 29 '20 13:01 fejta-bot

/remove-lifecycle stale

moshloop avatar Feb 25 '20 15:02 moshloop

There is renewed working going into this effort, not specifically around NCP though, but rather, around supporting

  • NSX-T IPAM: For node IP Addresses that fall in a range allocated from NSX-T
  • LoadBalancer: As a way to use native NSX rather then tools like HAProxy for APIServers.

jayunit100 avatar Apr 10 '20 02:04 jayunit100

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot avatar Jul 16 '20 14:07 fejta-bot

any action here?

andrewrothstein avatar Feb 28 '22 05:02 andrewrothstein

@yastij @akutz is there any chance to resurrect https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/722 ? Cloud provider for vSphere is supporting NSX-T loadbalancers, see https://github.com/kubernetes/cloud-provider-vsphere/blob/master/pkg/cloudprovider/vsphere/loadbalancer/README.md

lukasmrtvy avatar Jun 01 '23 22:06 lukasmrtvy