cluster-api-provider-azure icon indicating copy to clipboard operation
cluster-api-provider-azure copied to clipboard

Add support to disable CAPZ components through a manager flag

Open bryan-cox opened this issue 7 months ago • 14 comments

What type of PR is this? /kind bug

What this PR does / why we need it: Adds the ability to disable CAPZ components through a manager flag. Flags added for disabling ASO Secret Controller and disabling Azure JSON Machine Controller.

Which issue(s) this PR fixes: Fixes #5472

Special notes for your reviewer:

TODOs:

  • [ ] squashed commits
  • [ ] includes documentation
  • [ ] adds unit tests
  • [ ] cherry-pick candidate

Release note:

Adds the ability to disable CAPZ components through a manager flag. Flags added for disabling ASO Secret Controller and disabling Azure JSON Machine Controller.

bryan-cox avatar Apr 07 '25 19:04 bryan-cox

/assign @nawazkh

bryan-cox avatar Apr 07 '25 19:04 bryan-cox

Codecov Report

Attention: Patch coverage is 13.88889% with 31 lines in your changes missing coverage. Please review.

Project coverage is 52.82%. Comparing base (3a9eb5b) to head (f82a3ac). Report is 146 commits behind head on main.

Files with missing lines Patch % Lines
main.go 0.00% 31 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5552      +/-   ##
==========================================
- Coverage   53.27%   52.82%   -0.45%     
==========================================
  Files         272      279       +7     
  Lines       29522    29629     +107     
==========================================
- Hits        15727    15652      -75     
- Misses      12980    13160     +180     
- Partials      815      817       +2     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Apr 07 '25 20:04 codecov[bot]

the AzureManagedControlPlane CR which is behind the ASOAPI feature gate.

This is incorrect. The ASOAPI feature gate only enables controllers for the AzureASOManagedControlPlane and the other AzureASO... resources. This ASO secret controller is necessary for all of the resource types that CAPZ manages with ASO, including resource groups and vnets which are created for every AzureCluster and AzureManagedControlPlane. I suspect that if you disable the ASOAPI feature gate in CI, then all of the e2e tests will blow up, not just the ones exercising the AzureASO... APIs.

I think I mentioned somewhere, maybe in #5099 or in a Slack thread related to that PR, that a better solution to this general problem IMO would be a generic toggle to enable or disable individual controllers and webhooks with command line flags. Either something like that, or we explicitly disclaim all support for any installation that is not exactly equivalent to the CRDs and other manifests we publish for releases, since we generally assume the controller manager is running when all of the CRDs are installed. Changing the meaning of existing feature gates isn't a sustainable way to solve the general "I didn't install a CRD and now the controller manager is crashing" problem.

nojnhuh avatar Apr 08 '25 01:04 nojnhuh

that a better solution to this general problem IMO would be a generic toggle to enable or disable individual controllers and webhooks with command line flags.

I agree, fwiw created this some time ago to track that effort https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/5294

enxebre avatar Apr 08 '25 07:04 enxebre

This is incorrect. The ASOAPI feature gate only enables controllers for the AzureASOManagedControlPlane and the other AzureASO... resources. This ASO secret controller is necessary for all of the resource types that CAPZ manages with ASO, including resource groups and vnets which are created for every AzureCluster and AzureManagedControlPlane. I suspect that if you disable the ASOAPI feature gate in CI, then all of the e2e tests will blow up, not just the ones exercising the AzureASO... APIs.

Thank you for adding more context on this Jon.

that a better solution to this general problem IMO would be a generic toggle to enable or disable individual controllers and webhooks with command line flags.

I agree, fwiw created this some time ago to track that effort #5294

I agree also. We could start with updating manager.yaml with a bunch of env variables that enable different controllers and webhooks.


So we essentially close out this PR @bryan-cox ?

nawazkh avatar Apr 08 '25 17:04 nawazkh

/test all

bryan-cox avatar Apr 30 '25 10:04 bryan-cox

/test all

bryan-cox avatar Apr 30 '25 11:04 bryan-cox

@nojnhuh @nawazkh - when you have a moment, can I get another look at this PR? If we are good with moving this direction, I can follow up with some doc on how this can be used.

bryan-cox avatar May 02 '25 12:05 bryan-cox

@nojnhuh @nawazkh - when you have a moment, can I get another look at this PR? If we are good with moving this direction, I can follow up with some doc on how this can be used.

I like the idea of being able to toggle controllers, so green flag from me on this approach.

However, we need to ensure that the Management cluster is functional despite turning off controller(maybe all in the future?), so maybe we also incorporate an e2e test to validate the functionality. We could add that test scenario in an optional test.

@bryan-cox , what do you say ?

nawazkh avatar May 02 '25 22:05 nawazkh

Un-assigning myself for now since I won't be able to get to this immediately. /unassign

nawazkh avatar May 30 '25 19:05 nawazkh

@bryan-cox Let me know if this is ready for another round of reviews!

/assign

willie-yao avatar Jun 05 '25 16:06 willie-yao

Hey @willie-yao this is ready for review from my perspective. Though I didn't add anymore tests from https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/5552#issuecomment-2848222375. If we want those, the PR will still need some work.

bryan-cox avatar Jun 05 '25 16:06 bryan-cox

/assign

willie-yao avatar Jun 26 '25 16:06 willie-yao

Hey @willie-yao 👋🏻 - I made an issue to keep track of adding an e2e for the functionality introduced in this PR, https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/5744, as we discussed a few weeks ago in the CAPZ weekly sync.

Please let me know if there is anything further I can do to help get this PR to merged.

bryan-cox avatar Jul 11 '25 18:07 bryan-cox

/test pull-cluster-api-provider-azure-e2e

bryan-cox avatar Jul 11 '25 19:07 bryan-cox

/retest

bryan-cox avatar Jul 12 '25 02:07 bryan-cox

@willie-yao this should be ready for you again. I fixed the issue and squashed the commits.

bryan-cox avatar Jul 15 '25 14:07 bryan-cox

LGTM label has been added.

Git tree hash: 6285739c96f72a1f0997d6fd71726c0eea7789cd

k8s-ci-robot avatar Jul 15 '25 17:07 k8s-ci-robot

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: willie-yao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Jul 15 '25 17:07 k8s-ci-robot

/cherry-pick release-1.20

mboersma avatar Jul 17 '25 16:07 mboersma

/cherry-pick release-1.19

mboersma avatar Jul 17 '25 16:07 mboersma

@mboersma: new pull request created: #5758

In response to this:

/cherry-pick release-1.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mboersma: new pull request created: #5759

In response to this:

/cherry-pick release-1.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.