cloud-provider-openstack
cloud-provider-openstack copied to clipboard
[cinder-csi-plugin] Multi region/clouds support for controllerServer
Affected binaries:
- cinder-csi-plugin
What this PR does / why we need it: Enable multi region / multi clouds support on cinder-csi-plugin controllerServer.
Which issue this PR fixes(if applicable): fixes #2035
Special notes for reviewers: I have a kubernetes cluster stretch on multiples openstack clusters (3 OVH and 1 onPremise linked by a private dark fiber). That why my approche is more "multi clouds" than only "multi region"
I don't touch nodeServer behavior (I simply deploy a Daemonset with a nodeSelector on a label whitch select nodes based on their hypervisor, and mount a dedicated secret with openstack creds associated)
The purpose is on controllerServer which handle all kube requests to manage pvc, he had be able to CRUD volumes on all managed openstack cluster.
I propose to use gcfg subsection feature to be abble to handle multiple "sections" Global in config files. This permit to be backaward compatible with configfiles syntax.
I choose to use StorageClass parameters to deal with cloud selection (i add an optionnal field cloud which containt config Global subsection name). In this way when a createVolumeRequest income controllerServer could identify OSInstance to used based on match between SC field parametes.cloud and his config Global subsections.
Issue
This PR is currently a MVP, which permit to create volumes and attach them to correct VM / OpenStack cluster, but i have a bit issue to troubleshoot, indeed all csi spec message don't contains volume_context or parameters, especially for DeleteVolumeRequest which only contain volumeID.
I have 2 ideas to troubleshoot this issue:
- query each managed OS clouds to find one who had a volume with searched ID (That's linear complexity)
- implement a kubernetes client and as kubernetes API to find (bad idea, I just realized that volume_id is openstack id not pv name)
- maybe use secrets field as it seems suggested in external-provisioner discussion
Release note:
feat add multi regions/clouds support.
Welcome @MatthieuFin!
It looks like this is your first PR to kubernetes/cloud-provider-openstack 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.
You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.
You can also check if kubernetes/cloud-provider-openstack has its own contribution guidelines.
You may want to refer to our testing guide if you run into trouble with your tests not passing.
If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!
Thank you, and welcome to Kubernetes. :smiley:
Hi @MatthieuFin. Thanks for your PR.
I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
As mentioned here and after some local tests it seems to be the proper implementation way.
I propose to avoid usage of storageClass parameter.cloud and volume context to propagate config cloud name between gRPC calls, and instead use a secret with a key cloud which contain the Global config subsection name, which permit to controller to retrieve proper credentials from his configuration.
/ok-to-test
Sorry, I won't have time to take a closer look before my vacations, will be back in 2 weeks. @kayrus might be the proper person to check this PR out.
Looks like the tests have to be fixed here.
/retest
/retest
Hi,
After a few days of use on my cluster, I encountered an issue, regard allowedTopologies set on StorageClass (first of all don't forget to enable feature gate Topology add argument --feature-gates Topology=true on container csi-provisioner) which permit to push allowedTopologies constraints on pv .spec.nodeAffinity and keep affinity on pod reschedule.
My issue concern topology keys, currently only available is topology.cinder.csi.openstack.org/zone and obviously my provider OVH use same zone name (nova) on each of their openstack cluster, so I need to add possibility to manage another key to differentiate them.
Hi, After a few days of use on my cluster, I encountered an issue, regard
allowedTopologiesset on StorageClass (first of all don't forget to enable feature gate Topology add argument--feature-gates Topology=trueon containercsi-provisioner) which permit to push allowedTopologies constraints on pv .spec.nodeAffinity and keep affinity on pod reschedule.My issue concern topology keys, currently only available is topology.cinder.csi.openstack.org/zone and obviously my provider OVH use same zone name (nova) on each of their openstack cluster, so I need to add possibility to manage another key to differentiate them.
we have exactly the same problem (several openstack with same availability zone name)
I'm interesseted by this feature let me know if you need some test feedback
regards
Hi, After a few days of use on my cluster, I encountered an issue, regard
allowedTopologiesset on StorageClass (first of all don't forget to enable feature gate Topology add argument--feature-gates Topology=trueon containercsi-provisioner) which permit to push allowedTopologies constraints on pv .spec.nodeAffinity and keep affinity on pod reschedule. My issue concern topology keys, currently only available is topology.cinder.csi.openstack.org/zone and obviously my provider OVH use same zone name (nova) on each of their openstack cluster, so I need to add possibility to manage another key to differentiate them.we have exactly the same problem (several openstack with same availability zone name)
I'm interesseted by this feature let me know if you need some test feedback
regards
Hi !
You could use my last commit to build image, if you'r lazy to build it you could use mine exposed here
Personally I set labels on my nodes topology.kubernetes.io/region which contains my region name, so i have a daemonSet with nodeSelector on this label to deploy my nodeplugin (nodeServers) and i add args to container --additionnal-topology topology.kubernetes.io/region=GRA9 for example
This permit to create a storageClass with following allowedTopology:
allowedTopologies:
- matchLabelExpressions:
- key: topology.cinder.csi.openstack.org/zone
values:
- nova
- key: topology.kubernetes.io/region
values:
- GRA9
Moreover you have to enable option ignore-volume-az in controllerServer configuration:
[BlockStorage]
bs-version=v3
ignore-volume-az=True
[Global]
auth-url="https://auth.cloud.ovh.net/v3"
....
In that way if you print your CSINode object you shoul see
topologyKeys:
- topology.cinder.csi.openstack.org/zone
- topology.kubernetes.io/region
Which permit you to use correctly your storage class based on another param than availability zone name
do we aggre that I have to deploy the whole cinder csi (daemonset+deployment) for both of my openstack ? with field "region" and cloud.conf for each openstack right ?
do we aggre that I have to deploy the whole cinder csi (daemonset+deployment) for both of my openstack ? with field "region" different for each openstack right ?
you have to provide a secret with your differents openstack clusters credentials
[BlockStorage]
bs-version=v3
ignore-volume-az=True
[Global "OVH-GRA11"]
auth-url="https://auth.cloud.ovh.net/v3"
username="****"
password="****"
region="GRA11"
tenant-id="****"
tenant-name="****"
domain-name="Default"
[Global "OVH-GRA9"]
auth-url="https://auth.cloud.ovh.net/v3"
username="***"
password=""
region="GRA9"
tenant-id="****"
tenant-name="****"
domain-name="Default"
...
One deployement for controller with argument cloud-name referencing your configuration in that case : --cloud-name=GRA9 --cloud-name=GRA11 which permit to your controller to manager all your OS clusters
And one daemonset nodeServer with nodeselector per openstack cluster with --cloud-name=GRA9 option for corresponding OS cluster and --additionnal-topology topology.kubernetes.io/region=*GRA9
I share you my local helm chart to deploy this stack it will be probably simpler than words ;-) csi-cinder.tar.gz
did you add something specific to the build step I have this error:
Args (comma-delimited): /bin/cinder-csi-plugin,-v=2,--endpoint=unix://csi/csi.sock,--cloud-config=/etc/kubernetes/cloud.conf,--additionnal-topology=topology.kubernetes.io/region=dc1-int
/bin/cinder-csi-plugin: /lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.32' not found (required by /bin/cinder-csi-plugin) 2024/02/27 14:20:19 Now listening for interrupts /bin/cinder-csi-plugin: /lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.34' not found (required by /bin/cinder-csi-plugin)
I build binary locally with:
VERSION=v1.29.1-rc.16 make build cinder-csi-plugin
and docker image with:
REGISTRY='0g4p75r8.c1.gra9.container-registry.ovh.net/infra' VERSION=v1.29.1-rc.16 make build-local-image-cinder-csi-plugin
You need docker buildx which should embed right verison of glibc on my local system i have glibc 2.39:
$ ldd --version
ldd (GNU libc) 2.39
/retest
/retest
I took your latest commit and rebuild image. controller successfully list volumes on both openstack and daemon are up and running with right region and tolopology labels.
But when provisionner cannot create volume with this error:
I0229 09:08:39.891611 1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"kube-system", Name:"pod-with-volume", UID:"c8694fba-26a7-4288-a9b9-9b065023585a", APIVersion:"v1", ResourceVersion:"1746245", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "region-1": rpc error: code = InvalidArgument desc = [CreateVolume] specified cloud undefined
event with --additionnal-topology=topology.kubernetes.io/region=region-2 --cloud-name=region-2
extra arg in cinder-csi-plugin container
with this cloud config:
[Global "region-1"] region=region-1 tls-insecure=true auth-url=https://keystone.region-1.example:5000/v3 username=region-1-member password=region-1-password tenant-name=region-1-tenant [Global "region-2"] tls-insecure=true region=region-2 auth-url=https://keystone.region-2.example:5000/v3 username=region-2-member password=region-2-password tenant-name=region-2-tenant [BlockStorage] bs-version=v3 ignore-volume-az=true rescan-on-resize=true [Metadata] search-order=metadataService
Yes, I think you didn't create appropriate secret to reference you Global section "region-1" Create following secret in kube-system namespace:
apiVersion: v1
stringData:
cloud: region-1
kind: Secret
metadata:
name: cloud-config-sc-region-1
namespace: kube-system
type: Opaque
And link this secret to your storageClass in paramaters as follow:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.kubernetes.io/is-default-class: "false"
name: cinder-region-1
allowVolumeExpansion: true
allowedTopologies:
- matchLabelExpressions:
- key: topology.cinder.csi.openstack.org/zone
values:
- nova
- key: topology.kubernetes.io/region
values:
- region-1
parameters:
# Create/Delete Volume Secret
csi.storage.k8s.io/controller-publish-secret-name: cloud-config-sc-region-1
csi.storage.k8s.io/controller-publish-secret-namespace: kube-system
# Controller Publish/Unpublish Secret
csi.storage.k8s.io/node-publish-secret-name: cloud-config-sc-region-1
csi.storage.k8s.io/node-publish-secret-namespace: kube-system
# Node Stage Secret
csi.storage.k8s.io/node-stage-secret-name: cloud-config-sc-region-1
csi.storage.k8s.io/node-stage-secret-namespace: kube-system
# Node Publish Secret
csi.storage.k8s.io/provisioner-secret-name: cloud-config-sc-region-1
csi.storage.k8s.io/provisioner-secret-namespace: kube-system
provisioner: cinder.csi.openstack.org
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
Process same for region-2 (and adapt to your case)
Hi this version is deployed on our cluster from a couple of weeks it seems to be stable.
@dulek @mdbooth @jichenjc @kayrus Please take a look.
Hi this version is deployed on our cluster from a couple of weeks it seems to be stable.
@dulek @mdbooth @jichenjc @kayrus Please take a look.
With the help of Mathieu I successfuly deployed this version on one of our cluster spread over several openstack. It work like charm. We plan to test it on a single openstack in order to Check regression.
Sorry for latency, I had some emergencies to resolved first. @dulek thanks for your review I tried to answer to your questions and made changes requested. I will look to write some documentation, If anyone wanna help on documentation writing I could appreciate <3.
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: MatthieuFin (f9ab5b39075fb2be34dda32e299970b1ecebe65f, 29536a2b6be21eb7522d8c4cd0d66a5173888ee2, b3b83bfaec054bc30bd15e0cfc7d6d8d51cbd94d, 89383ee830ecbb854bcf92d22c4c6298ac0d959f, b0e5a1cc1731a85d18228fd243f39f19f0588de4, 62c71e3c4b1b4cb6230ab529345d57901c27708a, 56fcbd2d9110ce2c5c7289376a5cf927985e52f5)
Thanks to @sebastienmusso, @fllaap and @Krast76 who offered to help write documentation.
@dulek, @jichenjc, @kayrus, @mdbooth how can we advance this PR ?
Thank you for this PR, very interesting idea. 👍
So the complexity of this PR is that node-pluging needs openstack api credentials for ephemeral storage which is deprecated. Perhaps we should ask about the removal process for it?
The killer feature is to use one name of storageClass in stetefulset deployment. Base on nodAffinity/podAntiAffinity kubernetes scheduler spread the pods across the regions/zones.
I quit to using allowedTopologies in storageClass they are immutable. It is very hard to maintain. It's better to use nodeSelector/nodeAffinity instead; you can change these parameters during the lifetime.
To help kubernetes scheduler to choose the best region/zone (if not set) we need to expose free disk capacity.
I've added a capacity capabilities - https://github.com/kubernetes/cloud-provider-openstack/pull/2597
@dulek, @jichenjc, @kayrus, @mdbooth how can we advance this PR ? And if this capabilities is not available within openstack cinder-csi-plugin, how can we handle the stateful storage with cinder in a kubernetes cluster spread over multiple openstack cluster ?
And if this capabilities is not available within openstack cinder-csi-plugin, how can we handle the stateful storage with cinder in a kubernetes cluster spread over multiple openstack cluster ?
I don't know how other cloud are handling this kind of cross region (e.g AWS cloud provider handle this?) maybe someone can help comment
and I knew openstack itself handling multiple cloud (region) is that not good especialyly I don't think we knew the free size of the storage cinder backend IIRC and neutron might give us additional issue as cross region usually means no connection (not sure the impact to our network related things implementation)
so overall I think we may easily focus on compute (VM) provisioning and claim others as gap/follow up?
Hi @jichenjc ,
Other cloud provider (I'll take example of GCP) offer a unique storage class which is able to create pods across multiple "regions".
To do that they implements topology spec in their csi implementation with help of PluginCapability VOLUME_ACCESSIBILITY_CONSTRAINTS
Technically our openstack implementation doesn't support PluginCapability VOLUME_ACCESSIBILITY_CONSTRAINTS so GRPc calls createVolume doesn't have field accessibility_requirements fill.
We can implement support of PluginCapability VOLUME_ACCESSIBILITY_CONSTRAINTS to get informations in field accessibility_requirements and retreive information from scheduler or git topology choose by csi plugin to scheduler (depending of volumeBindingMode). anyway this approach doesn't help us here.
Indeed, GCP or AWS could implement that because all regions are uniformly managed by cloud API, so when you ask for a volume_id in DeleteVolumeRequest for example (same for others calls) to your cloud API, you have got response never mind in which region is the volume.
In our case, manage multi openstack region is the same approach than manage multiple openstack clusters. So I have many n clouds api to n regions, where in GCP you have n region for only 1 cloud API.
The "GCP/AWS like" behavior is probably to manage "regions" with AZ zone notion in openstack. But that is not my use case, I need to be able to support multiple openstack clusters not multiple AZ in one openstack cluster. And I don't know public cloud provider based on openstack which properly implement openstack availability zones...
Moreover if we wanna handle multi clouds (openstack regions) with the help of PluginCapability VOLUME_ACCESSIBILITY_CONSTRAINTS we have 2 possibility:
- modify csi spec to add some additional data which permit to trace cloud volume owner in GRPc calls (I don't think that will be possible/accepted)
- Implement an "openstack api like" software which expose a unique API to csi plugin and manage multiple clouds in backend and maintain an internal database to know which cloud has each volume_id, and ovoid possible collision in volume_id between 2 differents clouds.
Personally these 2 solution really to complex to implement and maintain.
This PR permit to do it easier way to understand and maintain imo.
I have a storageClass per regions/clouds. and if i wanna deploy 1 sts cross 3 regions I create 3 pvc one per regions (each with a different SC: data-0, data-1, data-2) and I use sts:
spec.volumeClaimTemplates:
- apiVersion: v1
kind:
name: data
In that way I'm able to dispatch pods from same sts cross different sc and different openstack cluster.
If you wanna a "multi-region" implementation like GCP or AWS
- Currently I couldn't implement it because I haven't openstack cluster environment with multi AZ.
- You miss a feature that GCP/AWS doesn't offer which is storage support cross multi openstack cluster.
From my limited experience as a cloud consumer, openstack providers offer openstacks with multiple regions and not with multiple AZs. It's sad I know, but it seems to be the public offerings.
Concerning network cross region indeed it shouldn't have to be managed by neutron, personally i managed it by myself out of openstack with some BGP routing and physical interconnections L2 (dark fiber) cross my DCs and ipsec tunnels for dev/poc links cross different DCs not physically linked (I shared my firsts reflections about it here but that's out of this PR scope). I know that @sebastienmusso run k8s clusters spread on multiples openstack clusters too, I don't know how you handle this point ?
Implement capacity capabilities as proposed by @sergelogvinov could probably help in multi AZ environment but in this PR context it should probably not have impact.
+1
I think we do not break anything with VOLUME_ACCESSIBILITY_CONSTRAINTS capability, and in multi-region setup, we need to add the extra label to the volume topology.
So we need to decide what the label we need to use topology.kubernetes.io/region or csi-specific label like cinder.csi.openstack.org/cluster. In second case csi-controller should label the nodes by itself.
I prefer to use well-known label topology.kubernetes.io/region but it adds limitations -> region == openstack-cluster-name
As 'openstack-cluster-name' is a string, it should not be a problem.
I agree, support for VOLUME_ACCESSIBILITY_CONSTRAINTS capability should be compatible with this multi-cloud PR implementation. If anyone has access to a multi-AZ openstack cluster :crossed_fingers: .
I intentionally let the user choice of which label to use, topology.kubernetes.io/region is just mentioned as an example in the cli argument --additional-topology that way we are not stuck if we wanna manage this label with occm or manually by the user.