vsphere-csi-driver icon indicating copy to clipboard operation
vsphere-csi-driver copied to clipboard

Add support for multiple vCenters

Open SandeepPissay opened this issue 3 years ago • 22 comments

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

What happened: Currently vSphere CSI supports a single vCenter server. There are customers who want to deploy Kubernetes across multiple vCenter. The primary reason is high availability. I'm creating this issue to add support for multiple vCenters in vSphere CSI.

SandeepPissay avatar Apr 01 '21 19:04 SandeepPissay

We are such a customer. :-) Would it be possible to implement this in such a way that we could migrate from a single-vCenter deployment to multi-vCenter?

tgelter avatar Apr 01 '21 19:04 tgelter

@tgelter You could also deploy Kubernetes across multiple datacenters within a single vCenter and use vCenter HA to protect against vCenter failure. Could you outline your use cases to run Kubernetes across multiple vCenter servers? So far I heard the only reason is to protect against vCenter failure for high availability. Any other reasons or use cases?

SandeepPissay avatar Apr 01 '21 19:04 SandeepPissay

@tgelter You could also deploy Kubernetes across multiple datacenters within a single vCenter and use vCenter HA to protect against vCenter failure. Could you outline your use cases to run Kubernetes across multiple vCenter servers? So far I heard the only reason is to protect against vCenter failure for high availability. Any other reasons or use cases?

We already leverage HA + DRS anti-affinity rules (SDRS anti-affinity rule automation is a WIP), but none of these things protect us from more catastrophic failures which affect an entire availability zone (e.g. network core failure, cooling issue, weather events).

We'd like to be able to split Kubernetes clusters across AZs, with portions of critical components ("master" & etcd nodes) and a fraction of the worker tier in each AZ. If there's a failure, workloads would be able to rely on unaffected infrastructure. We'd expect our users to be responsible for replicating data between environments by provisioning CNS volumes in each AZ and running pods in each as well.

I should also mention that our VMware engineering team prefers to run individual vCenter servers in each AZ.

tgelter avatar Apr 01 '21 19:04 tgelter

@tgelter You could also deploy Kubernetes across multiple datacenters within a single vCenter and use vCenter HA to protect against vCenter failure. Could you outline your use cases to run Kubernetes across multiple vCenter servers? So far I heard the only reason is to protect against vCenter failure for high availability. Any other reasons or use cases?

We already leverage HA + DRS anti-affinity rules (SDRS anti-affinity rule automation is a WIP), but none of these things protect us from more catastrophic failures which affect an entire availability zone (e.g. network core failure, cooling issue, weather events).

We'd like to be able to split Kubernetes clusters across AZs, with portions of critical components ("master" & etcd nodes) and a fraction of the worker tier in each AZ. If there's a failure, workloads would be able to rely on unaffected infrastructure. We'd expect our users to be responsible for replicating data between environments by provisioning CNS volumes in each AZ and running pods in each as well.

I should also mention that our VMware engineering team prefers to run individual vCenter servers in each AZ.

Our infra team also take the same approach; we run an individual vCenter per AZ/DC. Would love to see this implemented :)

RyanW8 avatar Apr 01 '21 19:04 RyanW8

@tgelter @RyanW8 Have you guys considered creating a Datacenter within a vCenter as a AZ or vCenter cluster as a AZ? This way you could have a single vCenter manage multiple AZs. We have seen few customers using it this way.

SandeepPissay avatar Apr 01 '21 19:04 SandeepPissay

@tgelter @RyanW8 Have you guys considered creating a Datacenter within a vCenter as a AZ or vCenter cluster as a AZ? This way you could have a single vCenter manage multiple AZs. We have seen few customers using it this way.

I don't quite understand what you mean? Could you please elaborate :)

RyanW8 avatar Apr 01 '21 19:04 RyanW8

@tgelter @RyanW8 Have you guys considered creating a Datacenter within a vCenter as a AZ or vCenter cluster as a AZ? This way you could have a single vCenter manage multiple AZs. We have seen few customers using it this way.

We've used this approach in the past. We found that we liked the reliability of dedicated vCenter Servers, and also ran into scaling limits on older versions (5.x, I believe) due to too many hosts/VMs.

tgelter avatar Apr 01 '21 19:04 tgelter

@tgelter @RyanW8 Have you guys considered creating a Datacenter within a vCenter as a AZ or vCenter cluster as a AZ? This way you could have a single vCenter manage multiple AZs. We have seen few customers using it this way.

I don't quite understand what you mean? Could you please elaborate :)

Here's an example - https://vsphere-csi-driver.sigs.k8s.io/driver-deployment/deploying_csi_with_zones.html#set_up_zones_in_vsphere. In this example, we have a single vCenter. Each vCenter cluster is an AZ. All the AZs are within a single Datacenter. Actually you can have one Datacenter per vCenter cluster as well.

SandeepPissay avatar Apr 01 '21 20:04 SandeepPissay

We found that we liked the reliability of dedicated vCenter Servers, and also ran into scaling limits on older versions (5.x, I believe) due to too many hosts/VMs.

@tgelter Thanks, this is useful info.

SandeepPissay avatar Apr 01 '21 20:04 SandeepPissay

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot avatar Jul 01 '21 14:07 fejta-bot

/remove-lifecycle stale

davidkarlsen avatar Jul 01 '21 22:07 davidkarlsen

+1 this is our architecture as well. We have 2 DCs, each with its own vCenter. Would very much like to have a K8s cluster span our datacenters, but we can't until the CSI driver supports it. :(

mstrent avatar Jul 08 '21 00:07 mstrent

+1 we also deployed Kubernetes on top of several vCenters. We need this feature for multi regional setup.

wacken89 avatar Aug 03 '21 18:08 wacken89

Thanks @mstrent @wacken89. We have this in the backlog and I will discuss this with our PM.

SandeepPissay avatar Aug 04 '21 17:08 SandeepPissay

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Nov 02 '21 18:11 k8s-triage-robot

/remove-lifecycle stale

tgelter avatar Nov 02 '21 19:11 tgelter

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 31 '22 20:01 k8s-triage-robot

/remove-lifecycle stale

tgelter avatar Jan 31 '22 20:01 tgelter

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar May 01 '22 20:05 k8s-triage-robot

/remove-lifecycle stale

davidkarlsen avatar May 02 '22 00:05 davidkarlsen

Hello, I was wondering if there has been a decision yet to support multiple vCenters in vsphere-csi-driver? Similar to other users in this issue, we run vCenter per AZ model, so far up to 4 AZs (4 vCenters) in a region.

defo89 avatar Jul 06 '22 11:07 defo89

Same requirements. And a question as a workaround: would a be possible to deploy 2 csi drivers, one csi per vcenter with local configuration. Then we could rely on distinct storage classes ?

poblin-orange avatar Jul 25 '22 10:07 poblin-orange

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 23 '22 11:10 k8s-triage-robot

This is still important to our organization. The lack of it is a consideration in moving workloads off VMWare to cloud providers.

/remove-lifecycle stale

mstrent avatar Oct 23 '22 17:10 mstrent

There is a significant movement in this direction, see milestone https://github.com/kubernetes-sigs/vsphere-csi-driver/milestone/6

defo89 avatar Oct 23 '22 21:10 defo89

@defo89 Awesome! Thank you for the update.

mstrent avatar Oct 25 '22 18:10 mstrent

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 23 '23 18:01 k8s-triage-robot

/remove-lifecycle stale

tgelter avatar Jan 23 '23 21:01 tgelter

Any updates here? We also need this feature. Thx.

platovnick avatar Jan 30 '23 14:01 platovnick

The vSphere CSI Driver has been upgraded with the addition of Multi vCenter support. You can access the latest release at https://github.com/kubernetes-sigs/vsphere-csi-driver/releases/tag/v3.0.0 and documentation can be found at https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/3.0/vmware-vsphere-csp-getting-started/GUID-8B3B9004-DE37-4E6B-9AA1-234CDA1BD7F9.html

divyenpatel avatar Mar 23 '23 00:03 divyenpatel