ceph-csi
ceph-csi copied to clipboard
Move topologyConstrainedPools to Beta status
Describe the feature you'd like to have
At this time, topologyContrainedPools has been in alpha stage for 2 years with very low noise. The feature is not documented but in the helm chart and its status might scares away lots of potential users.
What is the value to the end user? (why is it a priority?)
We're implementing a multi-zone cluster on on-premise bare metal. This cluster will be home of a few large elasticsearch clusters. We have 21 nodes in the cluster : 1- 3 masters on vms 2- 6 workers for applications 3- 12 workers for storage
All those nodes are spread on 3 datacenters connected with evpn fabrics. This feature will allow us to build statefulset and scale them with storage and pods evenly spread on the different datacenters using the topologySpreadConstraints.
We're doing rook deployment for our ceph cluster. We created 3 cephblockpools, one for each zone by changing the crushRoot to the zone root. Then we create a storageclass with corresponding topologyContrainedPools with matching zones. And finally we added the corresponding topologySpreadContrain to our statefulSet.
How will we know we have a good solution? (acceptance criteria)
Right now, the feature is ok but has to be enabled on the nodeplugin daemonset for it to work. Moving to beta then GA will make this activation optional and will have proper documentation for it. After discussion on Slack, it seems that this feature is to be completed but I failed to fully understand what's missing.
Additional context
One this feature in beta stage, the rook operator needs to have a way to enable it on its ressources.
csi-provisionner doesn't seems to need '--feature-gate=Topology=True' anymore.
According to the log :
csi-provisioner W0126 10:05:23.447685 1 feature_gate.go:237] Setting GA feature gate Topology=true. It will be removed in a future release.
Lets try to move this to Beta or supported state , we were tracking this effort under this https://github.com/ceph/ceph-csi/issues/2199 and most of it has been addressed, this is one of the last pending items... @sbskas
@sbskas would you be able to help us to move this feature to BETA in next/upcoming release ? What that basically means is that, more testing , adding tests if its missing and also updating the documentation ? please let us know .. Really appreciate if we can lift the support to BETA in next release. we welcome contributions.
as discussed in triage call, we could target this for 3.6 if we have a volunteer, otherwise we will move this out of the 3.6 rleease.
@humblec I volonteer.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.
@sbskas as mentioned in the release issue, we are planning to release 3.6 early next week..
As per your validation of this feature , is it good to qualify as Beta ? or we need more time to declare it as "beta" . please share your thought, accordingly we can reach consensus on support state
@humblec we tested all the case we met on test platform. We fixed the obvious problem encountered (#2828 and #2925). According to me, the feature is ready to go to beta. I would have like something like #2962 for easier integration with rook however, I'll live with it.
Thanks @sbskas for all the effort and conclusion on where we stand.
I am fine to call this Beta, however let me get atleast an ack from @ceph/ceph-csi-maintainers
@Madhu-1 @nixpanic can you share your thought about moving this feature to Beta in this release (3.6) ?
@Madhu-1 @nixpanic can you share your thought about moving this feature to Beta in this release (3.6) ?
IMO As we are at end of the release, Let's not hurry we can wait for #2962. based on that we can move to beta? if we are still in alpha we have a good chance to do some breaking changes if really needed.
Even if the feature is alpha we can work on the integration part.
All of this looks to have made is in release-3.7 quite a while ago. Please open a new issue if something isn't working as planned/expected.