cluster-api-provider-aws Tasks for adopting CAPI's Server Side Apply

trafficstars

This issue is tracking the list of tasks to make CAPI's SSA (Server Side Apply) to work with CAPA.

Why do we need this?

CAPA's spec.network.subnets is coauthored by CAPI and CAPA controllers when using ClusterClass. To properly manage these coauthoring slices and prevent them continuously getting patched by both controllers, CAPI is now using Server-Side Apply.

[x] Issue: https://github.com/kubernetes-sigs/cluster-api/issues/6320
[x] Solution: https://github.com/kubernetes-sigs/cluster-api/pull/6495

Changes Required in CAPA

[ ] https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/3531

The following issues require v1beta2 API version bump as a pre-requiste.

[ ] https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/3528
[ ] https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/3536

Other Issues to Follow

[x] https://github.com/kubernetes-sigs/controller-tools/pull/692
- This is needed to properly generate CRD manifests with the list markers. Currently, we are using a hack to overcome this issue.
[x] https://github.com/kubernetes-sigs/cluster-api/issues/6650
- There is an issue with controller metadata in logging. The log prints out a wrong controller type and kind.

CAPA issues that will be resolved

[ ] https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/3399
[ ] https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/3397

Jun 15 '22 11:06 pydctw

PoC

While waiting for controller-tools and listMapKey issues to be worked on, did an Initial PoC with test purpose CRDs. This required some hacks so the result needs to be confirmed when all the tasks listed in Changes Required in CAPA section is completed.

Hacks

Used []SubnetSpec, a slice, as a type for Subnets for CRD manifest generation.

// +optional
// +listType=map
// +listMapKey=id
Subnets []SubnetSpec `json:"subnets,omitempty"`

Made subnet.id as a required field in CRD to use as a listMapKey.

Scenario: BYO Infra Case

AWSClusterTemplate in ClusterClass

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSClusterTemplate
metadata:
  name: um-ec2-clusterclass-v1
spec:
  template:
    spec:
      network:
        vpc:
          id: vpc-0e38e0a4712b9b316
        subnets:
          - id: subnet-0588d98dd78abf69b
            availabilityZone: us-west-1c
            isPublic: true
          - id: subnet-0454fcf4f534539df
            availabilityZone: us-west-1c
      region: REPLACEME
      sshKeyName: REPLACEME

Findings

Observed that AWSCluster .spec.network.subnets value doesn't oscillate. Before the SSA, there were constant patching from both CAPA and CAPI controllers and the field constantly changed as observed in here
Managed field shows both CAPI and CAPA controllers own parts of .spec.network.subnets

apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
metadata:
  ...
  managedFields:
  - apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          f:cluster.x-k8s.io/cloned-from-groupkind: {}
          f:cluster.x-k8s.io/cloned-from-name: {}
        f:labels:
          f:cluster.x-k8s.io/cluster-name: {}
          f:topology.cluster.x-k8s.io/owned: {}
      f:spec:
        f:bastion:
          f:allowedCIDRBlocks: {}
          f:enabled: {}
        f:controlPlaneLoadBalancer:
          f:crossZoneLoadBalancing: {}
          f:scheme: {}
        f:identityRef:
          f:kind: {}
          f:name: {}
        f:network:
          f:cni:
            f:cniIngressRules: {}
          f:subnets: ⬅️
            k:{"id":"subnet-0454fcf4f534539df"}:
              .: {}
              f:availabilityZone: {}
              f:id: {}
              f:isPublic: {}
            k:{"id":"subnet-0588d98dd78abf69b"}:
              .: {}
              f:availabilityZone: {}
              f:id: {}
              f:isPublic: {}
          f:vpc:
            f:availabilityZoneSelection: {}
            f:availabilityZoneUsageLimit: {}
            f:id: {}
        f:region: {}
        f:sshKeyName: {}
    manager: capi-topology ⬅️
    operation: Apply
    time: "2022-06-15T12:54:06Z"
  - apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers:
          .: {}
          v:"awscluster.infrastructure.cluster.x-k8s.io": {}
      f:spec:
        f:controlPlaneEndpoint:
          f:host: {}
          f:port: {}
        f:network:
          f:subnets: ⬅️
            k:{"id":"subnet-0454fcf4f534539df"}:
              f:cidrBlock: {}
              f:routeTableId: {}
              f:tags:
                .: {}
                f:Name: {}
                f:kubernetes.io/cluster/um-ec2-cc-cluster: {}
                f:kubernetes.io/cluster/um-ec2-cluster: {}
                f:kubernetes.io/role/internal-elb: {}
            k:{"id":"subnet-0588d98dd78abf69b"}:
              f:cidrBlock: {}
              f:natGatewayId: {}
              f:routeTableId: {}
              f:tags:
                .: {}
                f:Name: {}
                f:kubernetes.io/cluster/um-ec2-cc-cluster: {}
                f:kubernetes.io/cluster/um-ec2-cluster: {}
                f:kubernetes.io/role/elb: {}
          f:vpc:
            f:cidrBlock: {}
            f:tags:
              .: {}
              f:Name: {}
    manager: cluster-api-provider-aws-controller ⬅️
    operation: Update
    time: "2022-06-15T12:55:43Z"
    ...

Jun 15 '22 16:06 pydctw

/triage accepted /priority important-soon

Jun 15 '22 19:06 sedefsavas

OCI provider fix for the problem: https://github.com/oracle/cluster-api-provider-oci/pull/116

Jul 28 '22 19:07 sedefsavas

cluster-api-provider-aws cluster-api-provider-aws copied to clipboard

Tasks for adopting CAPI's Server Side Apply

Why do we need this?

Changes Required in CAPA

Other Issues to Follow

CAPA issues that will be resolved

PoC

Hacks

Scenario: BYO Infra Case

Findings

cluster-api-provider-aws
cluster-api-provider-aws copied to clipboard