public-cloud-roadmap icon indicating copy to clipboard operation
public-cloud-roadmap copied to clipboard

Multi-AZs clusters

Open mhurtrel opened this issue 3 years ago • 27 comments

As a MKS administrator I want to spawn Kubernetes cluster distributed on multiple low-latency availability zones So that I can spread worker nodes accros regions and benfit from an ever better HA of my K8S control plane with contractual SLA

Note : We currently target this in France at first.

mhurtrel avatar Oct 22 '20 19:10 mhurtrel

We already forked the Openstack CCM to implement multi region at OVH and running it on prod. Feel free to ask if interested

cambierr avatar Nov 25 '20 22:11 cambierr

We already forked the Openstack CCM to implement multi region at OVH and running it on prod. Feel free to ask if interested

@cambierr I'm looking for that solution. I would appreciate it if you could share.

mr-ssd avatar Feb 24 '21 03:02 mr-ssd

It's available at https://hub.docker.com/repository/docker/alphanetworkstv/openstack-cloud-controller-manager-amd64 The only change compared to the uplink version is that you need to provide allowed regions to the config:

[Global]
username=...
password=...
auth-url=https://auth.cloud.ovh.net/v3
tenant-id=...
domain-id=default
region=GRA5
regions=GRA3,GRA5,GRA7

[Networking]
internal-network-name=...
ipv6-support-disabled=true
public-network-name=Ext-Net

[BlockStorage]
bs-version = v3

I still need to push the code somewhere to share the sources, by the way.

CSI is also available: https://hub.docker.com/repository/docker/alphanetworkstv/cinder-csi-plugin-amd64

cambierr avatar Feb 25 '21 08:02 cambierr

can we have something integrated in the console and easily deploy node pool on different region ?

tanandy avatar Jul 10 '21 09:07 tanandy

@mhurtrel has there been any movement on this? We're desperately in need of it because we're being affected by http://travaux.ovh.net/?do=details&id=50121& that is dependent on some upstream OpenStack fix - we recently had a 14hr outage because the OVH Volume wouldn't re-attach to any of our pods after a deployment. A multi-region cluster would avoid this.

@cambierr this looks interesting. I am not sure how to use it but I assume will need to use the OpenStack client to setup? I'll see if I can get some help from an ops eng. in the meantime can you provide any resources/guide on how to use these? I'd like to setup two test clusters to play with it.

zcourts avatar Jul 16 '21 17:07 zcourts

@zcourts what I built is a version of the https://github.com/kubernetes/cloud-provider-openstack that supports multiple region cluster. This is not an extension of the managed clusters by OVH.

If you are still interested, then you can use your own cluster created with kubeadm, rke, or whatever tool you want, then deploy the Openstack cloud controller in your cluster.

Basically, you can do the exact same thing as per https://github.com/kubernetes/cloud-provider-openstack except:

  • use "my" docker images instead of the official ones
  • include the regions parameter in the default conf (cfr https://github.com/ovh/public-cloud-roadmap/issues/22#issuecomment-785731012)

based on that, the CCM will query all the provided regions for instance data instead of the default one only. The CSI (the "kubernetes/openstack volume translator) will also work this way and be able to deal with volumes in the "good" region for the instances.

Please be aware that volumes from region A won't be able to be mounted on region B !

Feel free to ask if you need any help.

cambierr avatar Jul 16 '21 19:07 cambierr

@mhurtrel has there been any movement on this? We're desperately in need of it because we're being affected by http://travaux.ovh.net/?do=details&id=50121& that is dependent on some upstream OpenStack fix - we recently had a 14hr outage because the OVH Volume wouldn't re-attach to any of our pods after a deployment. A multi-region cluster would avoid this.

@cambierr this looks interesting. I am not sure how to use it but I assume will need to use the OpenStack client to setup? I'll see if I can get some help from an ops eng. in the meantime can you provide any resources/guide on how to use these? I'd like to setup two test clusters to play with it.

I seen Scaleway Kosmos provides easy multi cluster integration . You can create cluster there and use ovh nodes until we have something in our OVH console.

tanandy avatar Jul 16 '21 21:07 tanandy

@tanandy thanks - I didn't know about Scaleway's Kosmos - https://www.scaleway.com/fr/betas/#kuberneteskosmos it is in private beta though so it won't be an option for our production environments right now (also requires invite to access).

@cambierr ahhhh, that's clearer now. Couple of questions come to mind:

  1. How do OVH resources (volumes etc) get provisioned, are those requested by the CCM or some sub-component and get auto added or you need to attach what you need outside k8s then use?
  2. How many control planes do you end up having? A single one or one per region?

We've been working on a design that uses ISTIO multi-cluster. In this setup we would have a control plane per region. Obvious benefit is that we can entirely lose a region and continue operation, the challenge is the increased complexity in managing and controlling access to multiple control plane/clusters.

We've not gotten as far as doing test clusters with this yet as the issue only affected us 2 weeks ago but we're progressing along this route. I'll bring your links to our team's attention for them to consider as well.

How have you found doing it this way so far? Any common/obvious issues?

zcourts avatar Jul 17 '21 08:07 zcourts

How do OVH resources (volumes etc) get provisioned, are those requested by the CCM or some sub-component and get auto added or you need to attach what you need outside k8s then use? The CCM and CSI are responsible to provision, as per the "official" providers. The scheduler will allocate resources on a node and the CSI will discuss with the node's openstack cluster to be able to provision volumes as needed, for instance

How many control planes do you end up having? A single one or one per region? A single one, this is not multi kubernetes cluster stuff bug a single one on top of multiple Openstack regions

Si, in our setup we use regions from GRA, UK, and SBG in our cluster. This since then a multi region Kubernetes cluster with "ultra high HA" given three regions, all with their own infrastructures (power, net). This brings us the benefit of the HA without the complexity of federation.

cambierr avatar Jul 17 '21 10:07 cambierr

Why do we close this issue ?

tanandy avatar Jan 17 '22 14:01 tanandy

Hi @tanandy this was a mistake, I confirm we will work on this at a later stage

mhurtrel avatar Jan 17 '22 15:01 mhurtrel

Hi,

what's the status of this issue ?

Grounz avatar Feb 15 '22 15:02 Grounz

Hi OVH Is there some news on the multi region cluster? As said by @zcourts, this option exists with Scaleway Kosmos, works very well. Have you some schedule on the roadmap ?

qualitesys avatar Jun 08 '22 20:06 qualitesys

Hi @Grounz and @qualitesys I confirm that we will develop a solution for this, but I can't yet share you a public ETA. We are exploring option for a very rich multiregion, multicloud and multicluster experience. I will update this issue when possible.

mhurtrel avatar Jun 09 '22 08:06 mhurtrel

Hi @mhurtrel

have you any news ?

lenglet-k avatar Sep 06 '22 09:09 lenglet-k

Our current ETA is early 2023

mhurtrel avatar Sep 06 '22 12:09 mhurtrel

Hi @mhurtrel, any news on this feature?

botylev avatar Mar 13 '23 19:03 botylev

Hello @Spark3757 there as been a small delai on our IaaS pillars availaibilities in RBX, that is needed to fully validate our plans. But I should be able to give a new ETA soon. Sorry for the delay.

mhurtrel avatar Mar 13 '23 19:03 mhurtrel

Hi @mhurtrel, any news on this feature?

yctn avatar May 23 '23 16:05 yctn

Unfortunately, we are not yet able to share an ETA, though it remains a priority. As soon as we have ETA from our IAAS colleagues dependancies, I will update this issue.

mhurtrel avatar May 23 '23 16:05 mhurtrel

A small update on the matter : Multi-regions clusters will not be provided in the foreseeable future in Managed Kubernetes Services but though a new product offering capability to manage self-managed Kubernetes control planes by bringing your own nodes.

I refocused this issue on multi-AZ clusters, which will be offered in our multi-AZ regions, the first one being planned in France. We cannot give you an ETA yet but be assured it is identified as a priority.

mhurtrel avatar Sep 21 '23 15:09 mhurtrel

Hello @mhurtrel

Could we manage a multi-AZ cluster in different infrastructures like a PCI / HPC / HPC Secnumcloud mix?

What do you mean by "the ability to manage control planes ourselves", does that mean that we will be able to add control planes and manage their configurations and updates? Will it be secnumcloud compatible?

lenglet-k avatar Sep 22 '23 07:09 lenglet-k

Hi @lenglet-k

This issue (#22) will focus on MultiAZ (single region) Managed Kubernetes service (leveraging Public Cloud instances only). We will however also offer a multicloud/multicluster solution (in private beta in the next few months) : to build and manage self-amanged cluster on any infrastructure : https://github.com/ovh/public-cloud-roadmap/issues/467 . This one will at first require the infrastructure to offer internet connectivity, but will at a later stage support vrack-only connectivity. Yes you will be able to manage the control plane, using a supported distribution (more details soon). it will not be SecNumCloud compatible at launch.

mhurtrel avatar Sep 25 '23 09:09 mhurtrel

Hi everyone ! Though we of course still plan to support 3AZ-regions-based managed Kubernetes clusters, I also wanted to let you know that we just released Managed Rancher Service in alpha (aka private beta). Amongst many other features, this product anables you to create and self-managed cluster based on any infrastructure. You could for example spawn baremetal machines or VMs in multiple regions (from OVHcloud, another cloud providers or even onprem, provided the machines have internet access) to build an extremely-highly-available cluster.

Do not hesitate to consult this page to learn more and fill the short form to be one of the first users of this new managed service : https://labs.ovhcloud.com/en/managed-rancher-service/

mhurtrel avatar Oct 17 '23 11:10 mhurtrel

Rancher ? ouch !

Will the multi region MKS be based on it ?

cambierr avatar Oct 17 '23 12:10 cambierr

@cambierr nope, MKS and Manage Rancher Service are twi different products. Multi-zone MKS will me made available quickly after the first multi az ovhcloud public cloud region is made available and will not require managed rancher service.

mhurtrel avatar Oct 17 '23 17:10 mhurtrel

@mhurtrel thank you for the update. I would like to give you an honest feedback, if you want like to keep your clients you need to move from the speech of " we of course still plan to support 3AZ-regions-based managed Kubernetes clusters" to "This is the deadline for delivering and we are respecting it".

salimidruide avatar Mar 07 '24 14:03 salimidruide