cluster-api-provider-openstack icon indicating copy to clipboard operation
cluster-api-provider-openstack copied to clipboard

Tracking issue for improving resources status in CAPO

Open EmilienM opened this issue 1 year ago • 2 comments

This is a tracking issue for CAPO-related effort to improve resources status.

High level required changes with the new CAPI contract

Most of these changes will be required in the v1beta2 API contract (tentative Apr 2025).

OpenStackCluster

Following changes are planned for the contract for the OpenStackCluster resource:

  • Disambiguate the usage of the ready term by renaming fields used for the initial provisioning workflow
    • Rename status.ready into status.initialization.provisioned.
  • Remove failureReason and failureMessage.

Notes:

  • OpenStackCluster's status.initialization.provisioned will surface into Cluster's status.initialization.infrastructureProvisioned field.
  • OpenStackCluster's status.initialization.provisioned must signal the completion of the initial provisioning of the cluster infrastructure. The value of this field should never be updated after provisioning is completed, and Cluster API will ignore any changes to it.
  • OpenStackCluster's status.conditions[Ready] will surface into Machine's status.conditions[InfrastructureReady] condition.
  • OpenStackCluster's status.conditions[Ready] must surface issues during the entire lifecycle of the OpenStackCluster (both during initial OpenStackCluster provisioning and after the initial provisioning is completed).

OpenStackMachine

Following changes are planned for the contract for the OpenStackMachine resource:

  • Disambiguate the usage of the ready term by renaming fields used for the initial provisioning workflow
    • Rename status.ready into status.initialization.provisioned.
  • Remove failureReason and failureMessage.

Notes:

  • OpenStackMachine's status.initialization.provisioned will surface into Machine's status.initialization.infrastructureProvisioned field.
  • OpenStackMachine's status.initialization.provisioned must signal the completion of the initial provisioning of the cluster infrastructure. The value of this field should never be updated after provisioning is completed, and Cluster API will ignore any changes to it.
  • OpenStackMachine's status.conditions[Ready] will surface into Cluster's status.conditions[InfrastructureReady] condition.
  • OpenStackMachine's status.conditions[Ready] must surface issues during the entire lifecycle of the Machine (both during initial OpenStackMachine provisioning and after the initial provisioning is completed).

Notes on Conditions

Some remarks about Kubernetes API conventions in regard to conditions:

  • Polarity: Condition type names should make sense for humans; neither positive nor negative polarity can be recommended as a general rule
  • Use of the Reason field is required (currently in Cluster API reasons is added only when condition are false)
  • Controllers should apply their conditions to a resource the first time they visit the resource, even if the status is Unknown. (currently Cluster API controllers add conditions at different stages of the reconcile loops). Please note that:
    • If more than one controller adds conditions to the same resources, conditions managed by the different controllers will be applied at different times.
    • Kubernetes API conventions account for exceptions to this rule; for known conditions, the absence of a condition status should be interpreted the same as Unknown, and typically indicates that reconciliation has not yet finished.
  • We'll be using metav1.Conditions from the Kubernetes API.

Terminal Failures

By getting rid of the terminal failures, we have an opportunity to improve CAPO's reliability to handle OpenStack infrastructure failures, such as API rate limits or temporary unavailability which unfortunately happen often in large-scale production clouds. We'll need to investigate what these failures can be, and how we threat them:

  • CAPO continues to reconcile the resource and update conditions with a temporary state
  • CAPO stops reconciling the resource and update conditions to an human readable error message

EmilienM avatar Nov 27 '24 19:11 EmilienM

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Apr 21 '25 00:04 k8s-triage-robot

/remove-lifecycle stale

lentzi90 avatar Apr 22 '25 06:04 lentzi90

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 21 '25 06:07 k8s-triage-robot

/remove-lifecycle stale

lentzi90 avatar Jul 28 '25 07:07 lentzi90

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Oct 26 '25 07:10 k8s-triage-robot

/remove-lifecycle stale

lentzi90 avatar Oct 27 '25 07:10 lentzi90