dashboard icon indicating copy to clipboard operation
dashboard copied to clipboard

No operator action required for `CRDsWithProblematicConversionWebhooks`

Open acumino opened this issue 1 year ago • 2 comments

What would you like to be added: Currently, clusters with CRDsWithProblematicConversionWebhooks condition true are not marked for no operator action required. If this condition is present on the cluster it should be marked for no operator action required.

Why is this needed: This issue occurs when a user has CRDs which doesn't follow best practices, operator can't do anything to fix it. To keep dashboard clean we should mark CRDsWithProblematicConversionWebhooks condition true as no operator action required.

acumino avatar Sep 23 '24 04:09 acumino

Currently, the dashboard checks the error codes under status.lastErrors, where there is also ERR_PROBLEMATIC_WEBHOOK, which we flag as a user error. What lastError is present when the CRDsWithProblematicConversionWebhooks condition is true?

petersutter avatar Sep 23 '24 12:09 petersutter

ping @acumino

petersutter avatar Oct 09 '24 14:10 petersutter

CRDsWithProblematicConversionWebhooks is just a constraint in shoot status and not an error. It should not be considered for no operator action required since it can lead to other errors being ignored.

  constraints:
    - type: CRDsWithProblematicConversionWebhooks
      status: 'False'
      lastTransitionTime: ''
      lastUpdateTime: ''
      reason: CRDsWithProblematicConversionWebhooks
      message: >-
        Some CRDs in your cluster have multiple stored versions present and have
        a conversion webhook configured: <webhook-name>. Please see
        https://github.com/gardener/gardener/blob/master/docs/usage/shoot/shoot_status.md#constraints
        for more details.

acumino avatar Nov 14 '24 03:11 acumino

As of now, when sorting the clusters in the dashboard based on the Issue since, this constraint is also considered. It would be better if this is not considered as this gives the impression that the cluster has an issue for a very long time, even if the actual error is a transient error of few seconds.

acumino avatar Nov 14 '24 04:11 acumino