cluster-api-provider-azure icon indicating copy to clipboard operation
cluster-api-provider-azure copied to clipboard

Add support for ClusterClass and managed clusters

Open AAkindele opened this issue 3 years ago • 5 comments

/kind feature

Describe the solution you'd like [A clear and concise description of what you want to happen.]

Add support for using ClusterClass when working with managed AKS clusters. Ideally, there would be a flavor for AKS that uses ClusterClass to make is easier for first time users to get started with ClusterClass and managed clusters.

Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]

A use case for this is the ability to use lifecycle hook runtime extensions, https://cluster-api.sigs.k8s.io/tasks/experimental-features/runtime-sdk/implement-lifecycle-hooks.html. Clusters need to be created using ClusterClass in order for the hooks to work.

Environment:

  • cluster-api-provider-azure version: latest
  • Kubernetes version: (use kubectl version): 1.25.2
  • OS (e.g. from /etc/os-release): 20.04.3

AAkindele avatar Sep 29 '22 17:09 AAkindele

/area managedclusters

CecileRobertMichon avatar Oct 06 '22 17:10 CecileRobertMichon

Thanks for opening this issue @AAkindele!

The main change required to get ClusterClass working with managed clusters is going to be to add AzureManagedClusterTemplate and AzureManagedControlPlaneTemplate as described in https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20210526-cluster-class-and-managed-topologies.md#provider-implementation

In addition, we also need to evaluate whether CAPZ should implement recommendations outlined in https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220725-managed-kubernetes.md, which would involve moving some of the fields in AzureManagedCluster to AzureManagedControlPlane (breaking change). This is outlined in https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/2602/files#diff-f732765b7a645dc84b492225dc4db5fe8f47f49199a28d65a785f1d604714955R49

@jackfrancis, I wonder if it would make sense to add ClusterClass support as part of https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/2602 as well?

CecileRobertMichon avatar Oct 06 '22 17:10 CecileRobertMichon

This issue about MachinePool, https://github.com/kubernetes-sigs/cluster-api/issues/5991, is referenced in the Managed Kubernetes proposal here, https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220725-managed-kubernetes.md#clusterclass-support-for-machinepool. Would this be a blocker to adding ClusterClass managed cluster support?

@CecileRobertMichon, for the breaking change option, do you happen to have a list of the fields involved or other information on the scope of the breaking change. I am new to the codebase, but I'm willing to help gather the information if pointed in the right direction. Thanks.

AAkindele avatar Oct 13 '22 19:10 AAkindele

Would this be a blocker to adding ClusterClass managed cluster support?

Yes that would be required since AKS only supports MachinePools (not MachineDeployments). Good catch!

for the breaking change option, do you happen to have a list of the fields involved or other information on the scope of the breaking change

That proposal / implementation for CAPZ + AKS doesn't exist yet, and no I don't know which fields would be moved off the top of my head, but the changes needed are described here from a GCP perspective: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220725-managed-kubernetes.md#option-3-two-kinds-with-a-managed-control-plane-and-managed-infra-cluster-with-better-separation-of-responsibilities, we would want the equivalent changes for CAPZ.

@richardcase @pydctw are you able to provide more guidance on the above?

CecileRobertMichon avatar Oct 13 '22 21:10 CecileRobertMichon

I think the first step would be a detailed issue or design doc that describes: what is the current state of the CAPZ AzureManagedCluster API, what exact field changes would be needed to implement the recommendations made by CAPI , what are the pros and cons + risks. @AAkindele is that something you are interested in helping with? Perhaps with support and guidance from @jackfrancis since you are relatively new to CAPZ.

CecileRobertMichon avatar Oct 13 '22 21:10 CecileRobertMichon

👍 Yep! I'm interested in helping out.

AAkindele avatar Oct 18 '22 15:10 AAkindele

@richardcase @pydctw are you able to provide more guidance on the above?

Would it be helpful to have a call to discuss? We have also been thinking about the same with CAPA, which is problematic as we GA's EKS support.

richardcase avatar Oct 19 '22 13:10 richardcase

PTAL @AAkindele @richardcase: https://github.com/kubernetes-sigs/cluster-api-provider-azure/pull/2739

We are going to discuss at tomorrow's CAPZ office hours (9am PT) if either of you are able to join. If not, feel free to leave your thought in the PR above async.

CecileRobertMichon avatar Oct 20 '22 00:10 CecileRobertMichon

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 18 '23 00:01 k8s-triage-robot

/assign

willie-yao avatar Jan 30 '23 19:01 willie-yao

/lifecycle active

willie-yao avatar Jan 31 '23 21:01 willie-yao

To implement clusterclass for managed clusters, we need to determine a list of fields that are appropriate to be shared across clusters using the clusterclass template. AzureManagedCluster only has one field ControlPlaneEndpoint that won't be shared. However, there are many fields in AzureManagedControlPlane that cannot be shared. Here is a preliminary list of fields to omit from the AzureManagedControlPlaneTemplate, taken from azuremanagedcontrolplane_types.go. Every other field will be included in the template.

  • ResourceGroupName
  • NodeResourceGroupName
  • VirtualNetwork *Include VirtualNetwork.cidrBlock
  • ControlPlaneEndpoint
  • SSHPublicKey
  • ~~DNSServiceIP~~
  • ~~AADProfile.AdminGroupObjectIDs~~
  • APIServerAccessProfile.AuthorizedIPRanges

Please let me know what you think about this list and if there are any fields that are missing. For reference, I omitted fields that contain unique identifiers such as names and IPs that can't be shared.

cc @CecileRobertMichon

willie-yao avatar Feb 15 '23 19:02 willie-yao

/milestone v1.8

CecileRobertMichon avatar Feb 15 '23 21:02 CecileRobertMichon

Thanks @willie-yao. That list looks good to me. The only one I'm not 100% sure about is AADProfile.AdminGroupObjectIDs, it might make sense to share that one across clusters if you have an existing AAD group you want to use to be admin on all your clusters (at least that's my interpretation of https://learn.microsoft.com/en-us/azure/aks/managed-aad). What led you to include it in the no-sharing list?

CecileRobertMichon avatar Feb 15 '23 21:02 CecileRobertMichon

cc @dtzar @nojnhuh @jackfrancis @mtougeron for more eyes on https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/2684#issuecomment-1431887870

CecileRobertMichon avatar Feb 15 '23 21:02 CecileRobertMichon

Is it possible with ClusterClass to share part of a field? e.g. it seems like we might be able to share the vnet's cidrBlock since that itself doesn't refer to the name or resource group of the actual vnet. I saw a couple others like that under the vnet spec.

Could DNSServiceIP also be shared? I think that's a private IP address (by default at least) so two clusters could have the same value for that.

Besides those two things looks good to me!

nojnhuh avatar Feb 15 '23 22:02 nojnhuh

What led you to include it in the no-sharing list?

I definitely misunderstood the role of AdminGroupObjectIDs, and thought that the objectIDs were specific to a single cluster, rather than an AAD group you can apply to multiple clusters. I'll remove that from the list!

willie-yao avatar Feb 15 '23 22:02 willie-yao

Is it possible with ClusterClass to share part of a field?

Yes it is. I think it'll make sense to include the cidrBlock as well. I also thought that DNSServiceIP was not private so that's why I left it out. I'll go ahead and include those two fields!

willie-yao avatar Feb 15 '23 22:02 willie-yao

Update: Will also be omitting the APIServerAccessProfile.AuthorizedIPRanges field as it uses public IP addresses. More documentation on this field can be found here

willie-yao avatar Feb 16 '23 23:02 willie-yao

We are currently blocked on this feature until CAPI ClusterClass adds support for MachinePools. It is being worked on! I will still be implementing the types and an E2E test and will validate it once MachinePool support is added.

https://github.com/kubernetes-sigs/cluster-api/issues/5991

willie-yao avatar Feb 17 '23 00:02 willie-yao

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 04 '23 21:07 k8s-triage-robot

/remove-lifecycle stale

richardcase avatar Jul 11 '23 14:07 richardcase