kops icon indicating copy to clipboard operation
kops copied to clipboard

AWS: Cluster requires designating subnets by both ID and containing AZ

Open seh opened this issue 3 years ago • 13 comments

1. What kops version are you running?

1.25.3

2. What Kubernetes version are you running?

1.24.7

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

  1. Write a Cluster manifest that include several "spec.subnets" entries with both the "id" and "zone" fields populated.
  2. Run kops replace --filename=... to store that desired state.
  3. Observe that kOps accepts this designation of subnets via ID and containing AZ.
  4. Run kops edit cluster and remove one or more of the "spec.subnets[*].zone" fields.
    The subnet ID should be enough designation, given that AWS already knows the containing AZ for each subnet.
  5. Observe that kOps rejects the "zone" field omission, complaining as follows, per the fi/cloudup/awsup.FindRegion function:
    Error: invalid AWS zone: “” in subnet “utility-us-east-2a”

5. What happened after the commands executed?

kOps continues to mandate that the Cluster spec designate AWS EC2 subnets by both ID and containing AZ.

6. What did you expect to happen?

Given that cloudup is able to look up a subnet's containing AZ given its ID, why do we require that the Cluster designate subnets by both of these details? At best, the nominated AZ will match what AWS already knows.

7. Please provide your cluster manifest.

Cluster manifest
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  name: my-cluster
spec:
  additionalSans:
  - something.example.com
  api:
    loadBalancer:
      additionalSecurityGroups:
      - sg-0e73440d5420efaca
      class: Network
      crossZoneLoadBalancing: true
      type: Public
  authorization:
    rbac: {}
  awsLoadBalancerController:
    enableWAF: true
    enableWAFv2: true
    enabled: true
  certManager:
    enabled: true
    managed: false
  cloudConfig:
    disableSecurityGroupIngress: true
    manageStorageClasses: false
  cloudProvider: aws
  clusterAutoscaler:
    balanceSimilarNodeGroups: true
    enabled: true
  configBase: s3://my-bucket/my-cluster
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-us-east-2a
      name: a
    - instanceGroup: master-us-east-2b
      name: b
    - instanceGroup: master-us-east-2c
      name: c
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8081
      - name: ETCD_METRICS
        value: extensive
    name: main
  - etcdMembers:
    - instanceGroup: master-us-east-2a
      name: a
    - instanceGroup: master-us-east-2b
      name: b
    - instanceGroup: master-us-east-2c
      name: c
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8082
      - name: ETCD_METRICS
        value: basic
    name: events
  iam:
    allowContainerRegistry: true
    useServiceAccountExternalPermissions: true
  kubeAPIServer:
    featureGates:
      StatefulSetAutoDeletePVC: "true"
  kubeControllerManager:
    featureGates:
      StatefulSetAutoDeletePVC: "true"
  kubeProxy:
    enabled: false
  kubelet:
    anonymousAuth: false
    kubeReserved:
      cpu: 750m
      memory: .75Gi
  kubernetesVersion: 1.24.7
  metricsServer:
    enabled: true
  networkCIDR: 10.3.0.0/16
  networkID: vpc-03e705c4cddfee42b
  networking:
    calico:
      bpfEnabled: true
      crossSubnet: true
      encapsulationMode: vxlan
      typhaReplicas: 3
  nonMasqueradeCIDR: 100.64.0.0/10
  podIdentityWebhook:
    enabled: true
  serviceAccountIssuerDiscovery:
    discoveryStore: s3://some-oidc-bucket/my-cluster
    enableAWSOIDCProvider: true
  subnets:
  - cidr: 10.3.100.0/22
    id: subnet-090c2c40927d862dc
    name: utility-us-east-2a
    type: Utility
    zone: us-east-2a
  - cidr: 10.3.104.0/22
    id: subnet-03a87fdc7ead5ac0c
    name: utility-us-east-2b
    type: Utility
    zone: us-east-2b
  - cidr: 10.3.108.0/22
    id: subnet-0fc9d25fba3d58414
    name: utility-us-east-2c
    type: Utility
    zone: us-east-2c
  - cidr: 10.3.0.0/22
    egress: nat-0cd5a5fe7dd0b849f
    id: subnet-06eecb7efb2e834ae
    name: us-east-2a
    type: Private
    zone: us-east-2a
  - cidr: 10.3.4.0/22
    egress: nat-0e00e598753c7b918
    id: subnet-0439ad6bfde63be65
    name: us-east-2b
    type: Private
    zone: us-east-2b
  - cidr: 10.3.8.0/22
    egress: nat-0f936269f80667e78
    id: subnet-0dccfb6407fd18cb8
    name: us-east-2c
    type: Private
    zone: us-east-2c
  topology:
    masters: private
    nodes: private

8. Anything else do we need to know?

It's not clear whether this is a defect. I prefer to omit over-specifying details like these that we can at best only get wrong. If we want to note the containing AZ for a given subnet, we can write a comment to that effect in the Cluster manifest—given that we're using YAML and not JSON. If kOps needs to know the containing AZ, it should be able to consult the AWS API to learn of this detail from the related subnet.

See the preceding discussion in the "kops-users" channel of the "Kubernetes" Slack workspace.

/kind bug

seh avatar Dec 02 '22 13:12 seh

GCE uses the zone of an IG's subnet to fill the Zone field of the InstanceGroupManager it creates for the IG.

AWS uses the zones of subnets to find the best subnets to use for the API and bastion loadbalancers. It also uses the zone to add the subnet's additionalRoutes to the correct private routing table, to route to the correct NAT gateway in each private routing table, to tag at most one subnet per zone for use by load balancers, to link the zone to the correct routing table, and so on.

Hetzner uses the zones of the subnets to figure out the region the cluster is in.

johngmyers avatar Dec 02 '22 21:12 johngmyers

So maybe we can relax the requirement for subnets which have external egress, which don't have control plane nodes or bastions in them, and which aren't in GCE.

johngmyers avatar Dec 02 '22 21:12 johngmyers

I agree that there are many uses for the zone that contains a given subnet. The question I pose here is whether the user should be burdened with specifying this zone, or whether kOps should discover it on its own, so that the zone is correct per the cloud provider, rather than by the user's assertion.

seh avatar Dec 02 '22 21:12 seh

populateClusterSpec() might be able to fill in that field.

johngmyers avatar Dec 02 '22 21:12 johngmyers

Or Find()

johngmyers avatar Dec 02 '22 23:12 johngmyers

Office Hours: we would take a PR to implement this.

johngmyers avatar Dec 30 '22 17:12 johngmyers

Can someone explain me how is this a good first issue.

Sajiyah-Salat avatar Jan 24 '23 03:01 Sajiyah-Salat

It's a smallish, localized change.

johngmyers avatar Jan 24 '23 06:01 johngmyers

I agree that there are many uses for the zone that contains a given subnet. The question I pose here is whether the user should be burdened with specifying this zone, or whether kOps should discover it on its own, so that the zone is correct per the cloud provider, rather than by the user's assertion.

i understand the issue from above lines and I got that there will be some changes related to populateClusterSpec(). Can you guide me to the location and let me know which changes you would like to see. thank you.

Sajiyah-Salat avatar Jan 25 '23 02:01 Sajiyah-Salat

I believe the change needs to be in assignCIDRsToSubnets(). It currently fills in the CIDR field of the subnet based on what the cloud provider said the subnet's CIDR is. This needs to be extended to fill in the Zone field based on what the cloud provider says the zone is. It looks like the SubnetInfo struct already has a Zone field.

johngmyers avatar Jan 25 '23 03:01 johngmyers

A good test for this would be to remove the spec.subnets[0].zone setting in tests/integration/update_cluster/shared_subnet/in-v1alpha2.yaml

johngmyers avatar Jan 25 '23 03:01 johngmyers

Thanks for this. I will do it shortly

Sajiyah-Salat avatar Jan 25 '23 07:01 Sajiyah-Salat

Great do we need to add something there after removing it?

Sajiyah-Salat avatar Jan 27 '23 03:01 Sajiyah-Salat