assisted-service icon indicating copy to clipboard operation
assisted-service copied to clipboard

MGMT-17826: Disable LocalClusterImportController for non ACM/MCE

Open paul-maidment opened this issue 1 year ago • 4 comments

When installed outside of an ACM environment, the Infrastructure Operator has a panic. This is because the ManagedCluster CRD, which is required by the LocalClusterImportController feature is not present.

This PR checks for the presence of that CRD, if the CRD is not found then a condition is set to indicate the reason and reconciliation ends.

List all the issues related to this PR

  • [ ] New Feature
  • [ ] Enhancement
  • [x] Bug fix
  • [x] Tests
  • [ ] Documentation
  • [ ] CI/CD

What environments does this code impact?

  • [x] Automation (CI, tools, etc)
  • [x] Cloud
  • [x] Operator Managed Deployments
  • [x] None

How was this code tested?

  • [ ] assisted-test-infra environment
  • [ ] dev-scripts environment
  • [ ] Reviewer's test appreciated
  • [x] Waiting for CI to do a full test run
  • [ ] Manual (Elaborate on how it was tested)
  • [x] No tests needed

Checklist

  • [x] Title and description added to both, commit and PR.
  • [x] Relevant issues have been associated (see CONTRIBUTING guide)
  • [x] This change does not require a documentation update (docstring, docs, README, etc)
  • [x] Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

paul-maidment avatar May 16 '24 08:05 paul-maidment

@paul-maidment: This pull request references MGMT-17826 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.16.0" version, but no target version was set.

In response to this:

When installed outside of an ACM environment, the Infrastructure Operator has a panic. This is because the ManagedCluster CRD, which is required by the LocalClusterImportController feature is not present.

This PR checks for the presence of that CRD, if the CRD is not found then a condition is set to indicate the reason and reconciliation ends.

List all the issues related to this PR

  • [ ] New Feature
  • [ ] Enhancement
  • [x] Bug fix
  • [x] Tests
  • [ ] Documentation
  • [ ] CI/CD

What environments does this code impact?

  • [x] Automation (CI, tools, etc)
  • [x] Cloud
  • [x] Operator Managed Deployments
  • [x] None

How was this code tested?

  • [ ] assisted-test-infra environment
  • [ ] dev-scripts environment
  • [ ] Reviewer's test appreciated
  • [x] Waiting for CI to do a full test run
  • [ ] Manual (Elaborate on how it was tested)
  • [x] No tests needed

Checklist

  • [x] Title and description added to both, commit and PR.
  • [x] Relevant issues have been associated (see CONTRIBUTING guide)
  • [x] This change does not require a documentation update (docstring, docs, README, etc)
  • [x] Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar May 16 '24 08:05 openshift-ci-robot

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: paul-maidment

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar May 16 '24 08:05 openshift-ci[bot]

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 68.41%. Comparing base (6e2795d) to head (b95c0ee).

:exclamation: Current head b95c0ee differs from pull request most recent head c04616a

Please upload reports for the commit c04616a to get more accurate results.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6325      +/-   ##
==========================================
- Coverage   68.43%   68.41%   -0.02%     
==========================================
  Files         247      247              
  Lines       36169    36169              
==========================================
- Hits        24751    24746       -5     
- Misses       9231     9233       +2     
- Partials     2187     2190       +3     

see 1 file with indirect coverage changes

codecov[bot] avatar May 16 '24 08:05 codecov[bot]

/retest

paul-maidment avatar May 19 '24 07:05 paul-maidment

/retest

paul-maidment avatar May 22 '24 08:05 paul-maidment

/cc @avishayt @ori-amizur please review

gamli75 avatar May 27 '24 12:05 gamli75

@gamli75: GitHub didn't allow me to request PR reviews from the following users: please, review.

Note that only openshift members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @avishayt @ori-amizur please review

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar May 27 '24 12:05 openshift-ci[bot]

/cc @avishayt

paul-maidment avatar May 27 '24 12:05 paul-maidment

/cc @ori-amizur

paul-maidment avatar May 27 '24 12:05 paul-maidment

As per our discussion and some further testing. It's more beneficial to us to leave things the way they are. I'm going to close this PR as per that discussion.

paul-maidment avatar May 28 '24 18:05 paul-maidment

Re-opening as I have discovered that the infrastructure operator will not function correctly unless we handle this issue.

paul-maidment avatar Jun 13 '24 12:06 paul-maidment

@paul-maidment: This pull request references MGMT-17826 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

When installed outside of an ACM environment, the Infrastructure Operator has a panic. This is because the ManagedCluster CRD, which is required by the LocalClusterImportController feature is not present.

This PR checks for the presence of that CRD, if the CRD is not found then a condition is set to indicate the reason and reconciliation ends.

List all the issues related to this PR

  • [ ] New Feature
  • [ ] Enhancement
  • [x] Bug fix
  • [x] Tests
  • [ ] Documentation
  • [ ] CI/CD

What environments does this code impact?

  • [x] Automation (CI, tools, etc)
  • [x] Cloud
  • [x] Operator Managed Deployments
  • [x] None

How was this code tested?

  • [ ] assisted-test-infra environment
  • [ ] dev-scripts environment
  • [ ] Reviewer's test appreciated
  • [x] Waiting for CI to do a full test run
  • [ ] Manual (Elaborate on how it was tested)
  • [x] No tests needed

Checklist

  • [x] Title and description added to both, commit and PR.
  • [x] Relevant issues have been associated (see CONTRIBUTING guide)
  • [x] This change does not require a documentation update (docstring, docs, README, etc)
  • [x] Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Jun 13 '24 12:06 openshift-ci-robot

@paul-maidment: This pull request references MGMT-17826 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

When installed outside of an ACM environment, the Infrastructure Operator has a panic. This is because the ManagedCluster CRD, which is required by the LocalClusterImportController feature is not present.

This PR prevents the operator from being started if the ManagedCluster CRD is not present. With a message that makes it clear to the user why this occurred.

List all the issues related to this PR

  • [ ] New Feature
  • [ ] Enhancement
  • [x] Bug fix
  • [x] Tests
  • [ ] Documentation
  • [ ] CI/CD

What environments does this code impact?

  • [x] Automation (CI, tools, etc)
  • [x] Cloud
  • [x] Operator Managed Deployments
  • [x] None

How was this code tested?

  • [ ] assisted-test-infra environment
  • [ ] dev-scripts environment
  • [ ] Reviewer's test appreciated
  • [x] Waiting for CI to do a full test run
  • [ ] Manual (Elaborate on how it was tested)
  • [x] No tests needed

Checklist

  • [x] Title and description added to both, commit and PR.
  • [x] Relevant issues have been associated (see CONTRIBUTING guide)
  • [x] This change does not require a documentation update (docstring, docs, README, etc)
  • [x] Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Jun 13 '24 14:06 openshift-ci-robot

as I have discovered that the infrastructure operator will not function correctly unless we handle this issue

Can you elaborate a bit more? Ideally we'd have the failing behavior explained in the bug and also here in more detail.

carbonin avatar Jun 13 '24 15:06 carbonin

While I think we should consider this, I'm still split.

All customers would install the operator through MCE and thus would definitely have ManagedCluster, but we also publish our operator in the community operators catalog to install standalone. This change would break that workflow.

That said, we do the same thing for BMH and treating the two resources the same would make sense. But on the other hand booting using a BMH is a much more common use case than managing the local cluster with assisted.

@avishayt what do you think?

carbonin avatar Jun 13 '24 18:06 carbonin

as I have discovered that the infrastructure operator will not function correctly unless we handle this issue

Can you elaborate a bit more? Ideally we'd have the failing behavior explained in the bug and also here in more detail.

It is effectively a crash as soon as an attempt is made to list Managed cluster resources when the definition is not present.

So in a way it is very similar to the behaviour I have placed in this PR, with the exception that the code in this PR makes it very clear why the error occurs.

paul-maidment avatar Jun 13 '24 19:06 paul-maidment

/retest

paul-maidment avatar Jun 13 '24 19:06 paul-maidment

While I think we should consider this, I'm still split.

All customers would install the operator through MCE and thus would definitely have ManagedCluster, but we also publish our operator in the community operators catalog to install standalone. This change would break that workflow.

That said, we do the same thing for BMH and treating the two resources the same would make sense. But on the other hand booting using a BMH is a much more common use case than managing the local cluster with assisted.

@avishayt what do you think?

I think my original proposal to terminate the LocalClusterImportController reconcile loop early in the case of Managed cluster CRD not present is reasonable.

It prevents a hard crash while allowing the rest of the operator to be functional.

After creation of the Managed cluster CRD. The operator is able to start (though the user will need to force a reconcile)

paul-maidment avatar Jun 13 '24 19:06 paul-maidment

@paul-maidment: This pull request references MGMT-17826 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.17.0" version, but no target version was set.

In response to this:

When the ManagedCluster CRD is not present during installation of the infrastructure operator, a failure is encountered

"failed to get API group resources: unable to retrieve the complete list of server APIs: cluster.open-cluster-management.io/v1: the server could not find the requested resource"

This results in a non functional operator.

This PR aims to address that by ensuring that we do not complete the reconcile loop (and therefore perfom the check for the presence of a ManagedCluster CR) if the CRD is not present.

List all the issues related to this PR

  • [ ] New Feature
  • [ ] Enhancement
  • [x] Bug fix
  • [x] Tests
  • [ ] Documentation
  • [ ] CI/CD

What environments does this code impact?

  • [x] Automation (CI, tools, etc)
  • [x] Cloud
  • [x] Operator Managed Deployments
  • [x] None

How was this code tested?

  • [ ] assisted-test-infra environment
  • [ ] dev-scripts environment
  • [ ] Reviewer's test appreciated
  • [x] Waiting for CI to do a full test run
  • [ ] Manual (Elaborate on how it was tested)
  • [x] No tests needed

Checklist

  • [x] Title and description added to both, commit and PR.
  • [x] Relevant issues have been associated (see CONTRIBUTING guide)
  • [x] This change does not require a documentation update (docstring, docs, README, etc)
  • [x] Does this change include unit-tests (note that code changes require unit-tests)

Reviewers Checklist

  • Are the title and description (in both PR and commit) meaningful and clear?
  • Is there a bug required (and linked) for this change?
  • Should this PR be backported?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot avatar Jun 16 '24 14:06 openshift-ci-robot