vsphere-csi-driver Add support for multiple vCenters

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

What happened: Currently vSphere CSI supports a single vCenter server. There are customers who want to deploy Kubernetes across multiple vCenter. The primary reason is high availability. I'm creating this issue to add support for multiple vCenters in vSphere CSI.

Apr 01 '21 19:04 SandeepPissay

We are such a customer. :-) Would it be possible to implement this in such a way that we could migrate from a single-vCenter deployment to multi-vCenter?

Apr 01 '21 19:04 tgelter

@tgelter You could also deploy Kubernetes across multiple datacenters within a single vCenter and use vCenter HA to protect against vCenter failure. Could you outline your use cases to run Kubernetes across multiple vCenter servers? So far I heard the only reason is to protect against vCenter failure for high availability. Any other reasons or use cases?

Apr 01 '21 19:04 SandeepPissay

@tgelter You could also deploy Kubernetes across multiple datacenters within a single vCenter and use vCenter HA to protect against vCenter failure. Could you outline your use cases to run Kubernetes across multiple vCenter servers? So far I heard the only reason is to protect against vCenter failure for high availability. Any other reasons or use cases?

We already leverage HA + DRS anti-affinity rules (SDRS anti-affinity rule automation is a WIP), but none of these things protect us from more catastrophic failures which affect an entire availability zone (e.g. network core failure, cooling issue, weather events).

We'd like to be able to split Kubernetes clusters across AZs, with portions of critical components ("master" & etcd nodes) and a fraction of the worker tier in each AZ. If there's a failure, workloads would be able to rely on unaffected infrastructure. We'd expect our users to be responsible for replicating data between environments by provisioning CNS volumes in each AZ and running pods in each as well.

I should also mention that our VMware engineering team prefers to run individual vCenter servers in each AZ.

Apr 01 '21 19:04 tgelter

@tgelter You could also deploy Kubernetes across multiple datacenters within a single vCenter and use vCenter HA to protect against vCenter failure. Could you outline your use cases to run Kubernetes across multiple vCenter servers? So far I heard the only reason is to protect against vCenter failure for high availability. Any other reasons or use cases?

We already leverage HA + DRS anti-affinity rules (SDRS anti-affinity rule automation is a WIP), but none of these things protect us from more catastrophic failures which affect an entire availability zone (e.g. network core failure, cooling issue, weather events).

We'd like to be able to split Kubernetes clusters across AZs, with portions of critical components ("master" & etcd nodes) and a fraction of the worker tier in each AZ. If there's a failure, workloads would be able to rely on unaffected infrastructure. We'd expect our users to be responsible for replicating data between environments by provisioning CNS volumes in each AZ and running pods in each as well.

I should also mention that our VMware engineering team prefers to run individual vCenter servers in each AZ.

Our infra team also take the same approach; we run an individual vCenter per AZ/DC. Would love to see this implemented :)

Apr 01 '21 19:04 RyanW8

@tgelter @RyanW8 Have you guys considered creating a Datacenter within a vCenter as a AZ or vCenter cluster as a AZ? This way you could have a single vCenter manage multiple AZs. We have seen few customers using it this way.

Apr 01 '21 19:04 SandeepPissay

@tgelter @RyanW8 Have you guys considered creating a Datacenter within a vCenter as a AZ or vCenter cluster as a AZ? This way you could have a single vCenter manage multiple AZs. We have seen few customers using it this way.

I don't quite understand what you mean? Could you please elaborate :)

Apr 01 '21 19:04 RyanW8

@tgelter @RyanW8 Have you guys considered creating a Datacenter within a vCenter as a AZ or vCenter cluster as a AZ? This way you could have a single vCenter manage multiple AZs. We have seen few customers using it this way.

We've used this approach in the past. We found that we liked the reliability of dedicated vCenter Servers, and also ran into scaling limits on older versions (5.x, I believe) due to too many hosts/VMs.

Apr 01 '21 19:04 tgelter

@tgelter @RyanW8 Have you guys considered creating a Datacenter within a vCenter as a AZ or vCenter cluster as a AZ? This way you could have a single vCenter manage multiple AZs. We have seen few customers using it this way.

I don't quite understand what you mean? Could you please elaborate :)

Here's an example - https://vsphere-csi-driver.sigs.k8s.io/driver-deployment/deploying_csi_with_zones.html#set_up_zones_in_vsphere. In this example, we have a single vCenter. Each vCenter cluster is an AZ. All the AZs are within a single Datacenter. Actually you can have one Datacenter per vCenter cluster as well.

Apr 01 '21 20:04 SandeepPissay

We found that we liked the reliability of dedicated vCenter Servers, and also ran into scaling limits on older versions (5.x, I believe) due to too many hosts/VMs.

@tgelter Thanks, this is useful info.

Apr 01 '21 20:04 SandeepPissay

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

Jul 01 '21 14:07 fejta-bot

/remove-lifecycle stale

Jul 01 '21 22:07 davidkarlsen

+1 this is our architecture as well. We have 2 DCs, each with its own vCenter. Would very much like to have a K8s cluster span our datacenters, but we can't until the CSI driver supports it. :(

Jul 08 '21 00:07 mstrent

+1 we also deployed Kubernetes on top of several vCenters. We need this feature for multi regional setup.

Aug 03 '21 18:08 wacken89

Thanks @mstrent @wacken89. We have this in the backlog and I will discuss this with our PM.

Aug 04 '21 17:08 SandeepPissay

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Nov 02 '21 18:11 k8s-triage-robot

/remove-lifecycle stale

Nov 02 '21 19:11 tgelter

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 31 '22 20:01 k8s-triage-robot

/remove-lifecycle stale

Jan 31 '22 20:01 tgelter

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

May 01 '22 20:05 k8s-triage-robot

/remove-lifecycle stale

May 02 '22 00:05 davidkarlsen

Hello, I was wondering if there has been a decision yet to support multiple vCenters in vsphere-csi-driver? Similar to other users in this issue, we run vCenter per AZ model, so far up to 4 AZs (4 vCenters) in a region.

Jul 06 '22 11:07 defo89

Same requirements. And a question as a workaround: would a be possible to deploy 2 csi drivers, one csi per vcenter with local configuration. Then we could rely on distinct storage classes ?

Jul 25 '22 10:07 poblin-orange

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Oct 23 '22 11:10 k8s-triage-robot

This is still important to our organization. The lack of it is a consideration in moving workloads off VMWare to cloud providers.

/remove-lifecycle stale

Oct 23 '22 17:10 mstrent

There is a significant movement in this direction, see milestone https://github.com/kubernetes-sigs/vsphere-csi-driver/milestone/6

Oct 23 '22 21:10 defo89

@defo89 Awesome! Thank you for the update.

Oct 25 '22 18:10 mstrent

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jan 23 '23 18:01 k8s-triage-robot

/remove-lifecycle stale

Jan 23 '23 21:01 tgelter

Any updates here? We also need this feature. Thx.

Jan 30 '23 14:01 platovnick

The vSphere CSI Driver has been upgraded with the addition of Multi vCenter support. You can access the latest release at https://github.com/kubernetes-sigs/vsphere-csi-driver/releases/tag/v3.0.0 and documentation can be found at https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/3.0/vmware-vsphere-csp-getting-started/GUID-8B3B9004-DE37-4E6B-9AA1-234CDA1BD7F9.html

Mar 23 '23 00:03 divyenpatel

vsphere-csi-driver vsphere-csi-driver copied to clipboard

Add support for multiple vCenters

vsphere-csi-driver
vsphere-csi-driver copied to clipboard