cluster-api-provider-openstack icon indicating copy to clipboard operation
cluster-api-provider-openstack copied to clipboard

✨ Allow clusters without explicit availability zones

Open mkjpryor opened this issue 2 years ago • 13 comments

What this PR does / why we need it:

This PR adds the ability to create clusters without explicitly setting availability zones. The use case is discussed in detail in #1252.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged): Fixes #1252

Special notes for your reviewer:

Adds an additional, backwards-compatible flag to the OpenStack cluster spec.

TODOs:

  • [ ] squashed commits
  • if necessary:
    • [ ] includes documentation
    • [ ] adds unit tests

/hold

mkjpryor avatar Jun 01 '22 15:06 mkjpryor

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
Latest commit b2e18ea63600cf2b65a67d996b5e5147ad35c4c4
Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/629f0cf44f715b000aa83001
Deploy Preview https://deploy-preview-1253--kubernetes-sigs-cluster-api-openstack.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

netlify[bot] avatar Jun 01 '22 15:06 netlify[bot]

Hi @mkjpryor. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jun 01 '22 15:06 k8s-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mkjpryor To complete the pull request process, please assign seanschneeweiss after the PR has been reviewed. You can assign the PR to them by writing /assign @seanschneeweiss in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jun 01 '22 15:06 k8s-ci-robot

@mdbooth

Turns out it was basically as easy as I thought. This works like a dream for me. Can I get an /ok-to-test please?

mkjpryor avatar Jun 01 '22 15:06 mkjpryor

/ok-to-test

apricote avatar Jun 01 '22 15:06 apricote

/retest

mkjpryor avatar Jun 06 '22 15:06 mkjpryor

@jichenjc

I added some docs for the new option - can you review and suggest changes if required?

mkjpryor avatar Jun 07 '22 10:06 mkjpryor

@mdbooth

It might be an idea to merge the workers part of this fix separately. It's self-contained and very simple.

Happy to do this.

  1. We're working round behaviour which is defined by CAPI. We should discuss this with CAPI before making an API change in case they have any better ideas/imminent plans.

I'm not actually sure that we are. The InfraCluster.status.failureDomains field is explicitly optional in the spec (see https://cluster-api.sigs.k8s.io/developer/providers/cluster-infrastructure.html#infracluster-resources) and all this flag does is explicitly say that we don't care about AZs.

However I don't disagree with your comment that there might be a better approach.

On that second point, I have in mind something like:

  failureDomainModel: (AvailabilityZone|ServerGroup|None)

instead of IgnoreFailureDomain. This is barely a half-baked thought so read nothing into the detail of it, but the critical difference is that it defines what it is rather than what it is not.

This could actually work quite well - the only other thing I can think of is host aggregates.

I guess for my specific case I would use failureDomain: ServerGroup which would put the control plane nodes in a server group with either soft-anti-affinity or anti-affinity policies (could be configurable). The way this could work in code is:

  1. OpenStackCluster reconciliation in CAPO creates a server group
  2. The ID of the server group is reported using OpenStackCluster.status.failureDomains with the flag that identifies it as suitable for control plane nodes
  3. This will cause CAPI to create control plane nodes with the server group ID as the failureDomain
  4. CAPO knows to use the failureDomain as the server group when creating the server

What do you think?

mkjpryor avatar Jun 10 '22 12:06 mkjpryor

And I guess failureDomainModel: None would be basically what I have implemented when ignoreAvailabilityZones: true.

mkjpryor avatar Jun 10 '22 12:06 mkjpryor

@mdbooth

What if I change this PR to have failureDomainModel: AvailabilityZone | None instead of the flag, leaving us open for additional modes in the future?

Then submit another PR for #1256 that implements failureDomainModel: ServerGroup.

How does that sound as a plan?

mkjpryor avatar Jun 12 '22 19:06 mkjpryor

@mkjpryor: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jun 15 '22 17:06 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Sep 13 '22 18:09 k8s-triage-robot

/remove-lifecycle stale

jichenjc avatar Sep 14 '22 00:09 jichenjc

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 13 '22 00:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Jan 12 '23 01:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-triage-robot avatar Feb 11 '23 02:02 k8s-triage-robot

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Reopen this PR with /reopen
  • Mark this PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Feb 11 '23 02:02 k8s-ci-robot