cluster-api-provider-openstack ✨ Allow clusters without explicit availability zones

✨ Allow clusters without explicit availability zones

Open mkjpryor opened this issue 2 years ago • 13 comments

What this PR does / why we need it:

This PR adds the ability to create clusters without explicitly setting availability zones. The use case is discussed in detail in #1252.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #1252

Special notes for your reviewer:

Adds an additional, backwards-compatible flag to the OpenStack cluster spec.

TODOs:

[ ] squashed commits
if necessary:
- [ ] includes documentation
- [ ] adds unit tests

/hold

Jun 01 '22 15:06 mkjpryor

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name	Link
Latest commit	b2e18ea63600cf2b65a67d996b5e5147ad35c4c4
Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/629f0cf44f715b000aa83001
Deploy Preview	https://deploy-preview-1253--kubernetes-sigs-cluster-api-openstack.netlify.app
Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

Jun 01 '22 15:06 netlify[bot]

Hi @mkjpryor. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jun 01 '22 15:06 k8s-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mkjpryor To complete the pull request process, please assign seanschneeweiss after the PR has been reviewed. You can assign the PR to them by writing /assign @seanschneeweiss in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Jun 01 '22 15:06 k8s-ci-robot

@mdbooth

Turns out it was basically as easy as I thought. This works like a dream for me. Can I get an /ok-to-test please?

Jun 01 '22 15:06 mkjpryor

/ok-to-test

Jun 01 '22 15:06 apricote

/retest

Jun 06 '22 15:06 mkjpryor

@jichenjc

I added some docs for the new option - can you review and suggest changes if required?

Jun 07 '22 10:06 mkjpryor

@mdbooth

It might be an idea to merge the workers part of this fix separately. It's self-contained and very simple.

Happy to do this.

We're working round behaviour which is defined by CAPI. We should discuss this with CAPI before making an API change in case they have any better ideas/imminent plans.

I'm not actually sure that we are. The InfraCluster.status.failureDomains field is explicitly optional in the spec (see https://cluster-api.sigs.k8s.io/developer/providers/cluster-infrastructure.html#infracluster-resources) and all this flag does is explicitly say that we don't care about AZs.

However I don't disagree with your comment that there might be a better approach.

On that second point, I have in mind something like:
  failureDomainModel: (AvailabilityZone|ServerGroup|None)
instead of IgnoreFailureDomain. This is barely a half-baked thought so read nothing into the detail of it, but the critical difference is that it defines what it is rather than what it is not.

This could actually work quite well - the only other thing I can think of is host aggregates.

I guess for my specific case I would use failureDomain: ServerGroup which would put the control plane nodes in a server group with either soft-anti-affinity or anti-affinity policies (could be configurable). The way this could work in code is:

OpenStackCluster reconciliation in CAPO creates a server group
The ID of the server group is reported using OpenStackCluster.status.failureDomains with the flag that identifies it as suitable for control plane nodes
This will cause CAPI to create control plane nodes with the server group ID as the failureDomain
CAPO knows to use the failureDomain as the server group when creating the server

What do you think?

Jun 10 '22 12:06 mkjpryor

And I guess failureDomainModel: None would be basically what I have implemented when ignoreAvailabilityZones: true.

Jun 10 '22 12:06 mkjpryor

@mdbooth

What if I change this PR to have failureDomainModel: AvailabilityZone | None instead of the flag, leaving us open for additional modes in the future?

Then submit another PR for #1256 that implements failureDomainModel: ServerGroup.

How does that sound as a plan?

Jun 12 '22 19:06 mkjpryor

@mkjpryor: PR needs rebase.

Jun 15 '22 17:06 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 13 '22 18:09 k8s-triage-robot

/remove-lifecycle stale

Sep 14 '22 00:09 jichenjc

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Dec 13 '22 00:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Jan 12 '23 01:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Feb 11 '23 02:02 k8s-triage-robot

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Feb 11 '23 02:02 k8s-ci-robot

cluster-api-provider-openstack cluster-api-provider-openstack copied to clipboard

✨ Allow clusters without explicit availability zones

✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

cluster-api-provider-openstack
cluster-api-provider-openstack copied to clipboard

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!