api icon indicating copy to clipboard operation
api copied to clipboard

OTA-1339: `UpdateStatus`: API to expose information about update progress & health

Open petr-muller opened this issue 1 year ago • 13 comments
trafficstars

Implements the API proposed in https://github.com/openshift/enhancements/pull/1701 , addressing feedback on the early Update Health API Controller Proposal and Update Health API Draft docs.


Introduce a new UpdateStatus API (CRD) that exposes the status and health of the OpenShift cluster update process. In a typical OpenShift cluster, the API will be a singleton with an empty .spec, and its purpose will be to expose information through its status subresource.

UpdateStatus API Overview

apiVersion: update.openshift.io/v1alpha1
kind: UpdateStatus
metadata:
  name: cluster
spec: { }
status:
  controlPlane:
    _: ...
    informers:
    - name: cvo-example-informer
      insights: <list of insights reported by the informer>
    - name: mco-example-informer
      insights:
      - type: ClusterVersion # CV status insight
        _: ...
      - type: ClusterOperator # CO status insight
        _: ...
      - type: UpdateHealth # General update health insight
        _: ...
    conditions: <list of standard kubernetes conditions>
  workerPools:
  - name: workers
    _: ...
    informers: <list of informers with reported insights>
    conditions: <list of standard kubernetes conditions>
  - name: infra
    _: ...
  conditions: <list of standard kubernetes conditions>    

The API has three conceptual layers:

  1. Through the innermost layer .status.{controlPlane,workerPools[]}.informers, the API exposes detailed information about individual concerns related to the update, called "Update Insights." The API is prepared to allow multiple external informers to contribute insights, but in this enhancement, the only informer is the USC itself.
  2. The aggregation layer .status.{controlPlane,workerPools[]} reports higher-level information about the update through this layer, serving as the USC's interpretation of all insights.
  3. The outermost layer, .status.conditions, is used to report operational matters related to the USC itself (the health of the controller and gathering the insights, not the health of the update process).

We do not expect users to interact with UpdateStatus resources directly; the API is intended to be used mainly by tooling. A typical UpdateStatus instance is likely to be quite verbose.

I'm keeping iterations as commits but will squash before (eventual) merge.

petr-muller avatar Aug 26 '24 16:08 petr-muller

Hello @petr-muller! Some important instructions when contributing to openshift/api: API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

openshift-ci[bot] avatar Aug 26 '24 16:08 openshift-ci[bot]

Skipping CI for Draft Pull Request. If you want CI signal for your change, please convert it to an actual PR. You can still manually trigger a test run with /test all

openshift-ci[bot] avatar Aug 26 '24 16:08 openshift-ci[bot]

@petr-muller: This pull request references OTA-1339 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set.

In response to this:

~~Initial draft of the API, not expecting API reviewers to review yet, just OTA~~

Still kinda WIP but it starts being partially reviewable. I still need to incorporate some feedback from Update Health API Draft and also revise this paying more attention to OpenShift API Conventions.

I'm keeping iterations as commits but will squash before (eventual) merge.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Oct 07 '24 16:10 openshift-ci-robot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: petr-muller Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Dec 17 '24 16:12 openshift-ci[bot]

Looking at the results of the lint check here, a large number of these I know are auto-fixable, try make lint-fix on the PR and it should bring a lot of that stuff in line

JoelSpeed avatar Dec 18 '24 14:12 JoelSpeed

Ahaa! This is something I just merged which is showing that there is a diff between make lint-fix and what's checked in (that's what the git diff is showing), so please try make lint-fix, and then make lint to see what's left to fix

JoelSpeed avatar Dec 19 '24 17:12 JoelSpeed

@JoelSpeed I'm still changing the content a bit, I planned to address the linter issues once I'm done

petr-muller avatar Dec 19 '24 18:12 petr-muller

@petr-muller: This pull request references OTA-1339 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set.

In response to this:

Implements the API proposed in https://github.com/openshift/enhancements/pull/1701 , addressing feedback on the early Update Health API Controller Proposal and Update Health API Draft docs.


Introduce a new UpdateStatus API (CRD) that exposes the status and health of the OpenShift cluster update process. In a typical OpenShift cluster, the API will be a singleton with an empty .spec, and its purpose will be to expose information through its status subresource.

UpdateStatus API Overview

apiVersion: update.openshift.io/v1alpha1
kind: UpdateStatus
metadata:
 name: cluster
spec: { }
status:
 controlPlane:
   _: ...
   informers:
   - name: cvo-example-informer
     insights: <list of insights reported by the informer>
   - name: mco-example-informer
     insights:
     - type: ClusterVersion # CV status insight
       _: ...
     - type: ClusterOperator # CO status insight
       _: ...
     - type: UpdateHealth # General update health insight
       _: ...
   conditions: <list of standard kubernetes conditions>
 workerPools:
 - name: workers
   _: ...
   informers: <list of informers with reported insights>
   conditions: <list of standard kubernetes conditions>
 - name: infra
   _: ...
 conditions: <list of standard kubernetes conditions>    

The API has three conceptual layers:

  1. Through the innermost layer .status.{controlPlane,workerPools[]}.informers, the API exposes detailed information about individual concerns related to the update, called "Update Insights." The API is prepared to allow multiple external informers to contribute insights, but in this enhancement, the only informer is the USC itself.
  2. The aggregation layer .status.{controlPlane,workerPools[]} reports higher-level information about the update through this layer, serving as the USC's interpretation of all insights.
  3. The outermost layer, .status.conditions, is used to report operational matters related to the USC itself (the health of the controller and gathering the insights, not the health of the update process).

We do not expect users to interact with UpdateStatus resources directly; the API is intended to be used mainly by tooling. A typical UpdateStatus instance is likely to be quite verbose.

I'm keeping iterations as commits but will squash before (eventual) merge.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Jan 22 '25 16:01 openshift-ci-robot

/cc @JoelSpeed

petr-muller avatar Jan 22 '25 16:01 petr-muller

Please review the openshift API conventions, in particular I'd like to see the godocs expanded here. This is a very large API, and without proper documentation it's hard to understand what all of the various parts of it are for, I think from an end user perspective it is important to be able to explain each of these things

There's a tension between writing very detailed user-oriented docs and (my) expectation of being asked to change API structure during review :D I will make a round of improving the docs though.

petr-muller avatar Feb 04 '25 17:02 petr-muller

There's a tension between writing very detailed user-oriented docs and (my) expectation of being asked to change API structure during review :D

I very much appreciate that, but, unless I understand what we are trying to do with an API, it's very hard to recommend appropriate structures where things need to be changed. And docs is one of the best ways to get this understanding (I'd be asking the questions about it anyway so you'd either answer in GH comments or directly as godoc)

Happy to jump on a call at some point if you think going through the API sync might be helpful

JoelSpeed avatar Feb 04 '25 17:02 JoelSpeed

Yeah, I am not protesting, no worries. I hope it is at least a bit clear that I have not ignored Godocs entirely :D I mostly just have not, because of Curse of Knowledge, made a good tradeoff between reviewer understanding and investing effort into things that will be dropped. I will add more.

petr-muller avatar Feb 04 '25 18:02 petr-muller

@petr-muller: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify-client-go 8495ca67b053c5fbb5a59ad272ea6004502f7891 link true /test verify-client-go
ci/prow/verify 8495ca67b053c5fbb5a59ad272ea6004502f7891 link true /test verify
ci/prow/integration 8495ca67b053c5fbb5a59ad272ea6004502f7891 link true /test integration
ci/prow/verify-crd-schema 8495ca67b053c5fbb5a59ad272ea6004502f7891 link true /test verify-crd-schema
ci/prow/integration 8495ca67b053c5fbb5a59ad272ea6004502f7891 link true /test integration
ci/prow/verify 8495ca67b053c5fbb5a59ad272ea6004502f7891 link true /test verify
ci/prow/verify-crd-schema 8495ca67b053c5fbb5a59ad272ea6004502f7891 link true /test verify-crd-schema

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Apr 11 '25 22:04 openshift-ci[bot]

/shrug

petr-muller avatar Apr 14 '25 16:04 petr-muller

/close

Proposal was rejected internally

petr-muller avatar Apr 16 '25 13:04 petr-muller

@petr-muller: Closed this PR.

In response to this:

/close

Proposal was rejected internally

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar Apr 16 '25 13:04 openshift-ci[bot]