operator-controller icon indicating copy to clipboard operation
operator-controller copied to clipboard

:warning: ClusterExtentionRevision conditions

Open perdasilva opened this issue 1 month ago • 4 comments

Description

Summary

This PR refactors the ClusterExtensionRevision status conditions to provide clearer, more actionable feedback about revision lifecycle states. The changes introduce a new Progressing condition, simplify condition reasons, and ensure that revision-level errors (retrying states) are properly surfaced to the parent ClusterExtension.

This PR does introduce API breaking changes. But, only to experimental APIs. It should be ok to override.

Key Changes

  1. Refactored ClusterExtensionRevision Status Conditions

Simplified condition types:

  • Retained Available and Succeeded conditions
  • Added new Progressing condition for better visibility into rollout state
  • Removed obsolete condition reasons and consolidated error handling

Updated condition reasons to be more semantic and actionable:

  • Before: ReconcileFailure, RevisionValidationFailure, PhaseValidationError, ObjectCollisions, RolloutSuccess, Incomplete, Progressing
  • After: Retrying, Blocked, RollingOut, Archived, Migrated

API Changes:

  • Added Progressing column to kubectl get clusterextensionrevisions output
  1. Surfaced Revision Retrying Condition to ClusterExtension

Modified the Boxcutter applier to propagate retrying errors from ClusterExtensionRevision to the parent ClusterExtension. When a revision is in a Progressing=True state with Reason=Retrying, the error is now surfaced to the ClusterExtension, providing better visibility into failed reconciliation attempts. When the revision is Progressing=True/Succeeded the applier returns success.

  1. Added Comprehensive E2E Tests

Added end-to-end tests for ClusterExtensionRevision lifecycle scenarios to validate the new condition states and transitions.

  1. Added ClusterExtension documentation to the RolloutInProgress Progression reason

ClusterExtensionRevision Status Conditions

Available Condition

Indicates whether the revision's objects are available and passing the object status probes.

Status Reason Description
True ProbesSuceeded Objects are available and pass all probes
False ProbeFailed One or more objects are failing their availability probes
Unknown Reconciling Probes could not be observed / intermittent reconciliation error
Unknown Archived Revision is archived
Unknown Migrated Revision was migrated from Helm release

Progressing Condition

Indicates whether the revision is progressing to its next state. It follows the same semantic as the ClusterExtension and Deployment Progressing condition.

Status Reason Description
True Retrying Reconciliation failed due to validation errors, object collisions, or other recoverable errors; will retry
True RollingOut Revision is actively rolling out objects across phases
True Succeeded Revision has completed its rollout successfully
False Archived Revision has been archived and is no longer progressing

Not a part of this PR yet, but Progressing can also be Blocked for non-retryable errors.

Succeeded Condition

Indicates whether the revision has successfully completed its rollout.

Status Reason Description
True Succeeded Revision succeeded rolling out all objects and passed all probes

Migration Notes

  • The Progressing condition replaces the previous pattern of setting Available=False with various failure reasons
  • All retrying scenarios (validation failures, collisions, reconcile errors) now consistently set Progressing=True, Reason=Retrying and Available=Unknown
  • The new condition structure provides clearer separation between "something is wrong" (Progressing=Retrying) vs "rollout is in progress" (Progressing=RollingOut)

Reviewer Checklist

  • [ ] API Go Documentation
  • [ ] Tests: Unit Tests (and E2E Tests, if appropriate)
  • [ ] Comprehensive Commit Messages
  • [ ] Links to related GitHub Issue(s)

perdasilva avatar Nov 18 '25 12:11 perdasilva

Deploy Preview for olmv1 ready!

Name Link
Latest commit 04de6035a66d175d8d47ba172c024e493c7c535c
Latest deploy log https://app.netlify.com/projects/olmv1/deploys/692eb12b817d0b000844e6b8
Deploy Preview https://deploy-preview-2340--olmv1.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify[bot] avatar Nov 18 '25 12:11 netlify[bot]

Codecov Report

:x: Patch coverage is 83.54430% with 13 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 71.04%. Comparing base (b23e124) to head (04de603). :warning: Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
internal/operator-controller/applier/boxcutter.go 0.00% 7 Missing :warning:
...controllers/clusterextensionrevision_controller.go 92.95% 5 Missing :warning:
...er/controllers/clusterextension_reconcile_steps.go 0.00% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2340      +/-   ##
==========================================
+ Coverage   70.58%   71.04%   +0.45%     
==========================================
  Files          93       93              
  Lines        7333     7294      -39     
==========================================
+ Hits         5176     5182       +6     
+ Misses       1721     1680      -41     
+ Partials      436      432       -4     
Flag Coverage Δ
e2e 44.84% <0.00%> (+0.36%) :arrow_up:
experimental-e2e 14.14% <0.00%> (?)
unit 58.97% <83.54%> (+0.36%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Nov 18 '25 18:11 codecov[bot]

/lgtm

dtfranz avatar Dec 04 '25 04:12 dtfranz

/approve

pedjak avatar Dec 04 '25 09:12 pedjak

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pedjak, rashmigottipati

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Dec 04 '25 09:12 openshift-ci[bot]

/override go-apidiff/go-apidiff

perdasilva avatar Dec 04 '25 09:12 perdasilva

@perdasilva: /override requires failed status contexts, check run or a prowjob name to operate on. The following unknown contexts/checkruns were given:

  • go-apidiff/go-apidiff

Only the following failed contexts/checkruns were expected:

  • Verify PR title
  • crd-diff
  • e2e
  • experimental-e2e
  • extension-developer-e2e
  • go-apidiff
  • go-verdiff
  • goreleaser
  • lint
  • netlify/olmv1/deploy-preview
  • st2ex-e2e
  • tide
  • unit-test-basic
  • upgrade-st2st-e2e
  • verify

If you are trying to override a checkrun that has a space in it, you must put a double quote on the context.

In response to this:

/override go-apidiff/go-apidiff

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar Dec 04 '25 09:12 openshift-ci[bot]

/override go-apidiff

perdasilva avatar Dec 04 '25 09:12 perdasilva

@perdasilva: Overrode contexts on behalf of perdasilva: go-apidiff

In response to this:

/override go-apidiff

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar Dec 04 '25 09:12 openshift-ci[bot]