kops icon indicating copy to clipboard operation
kops copied to clipboard

Cannot run kops update cluster on AWS with no spec.cloudControllerManager set when AWSEBSCSIDriver is not managed by kops

Open flopib opened this issue 2 years ago • 5 comments

/kind bug

1. What kops version are you running? The command kops version, will display this information.

1.24.5

Update: issue also happens with 1.26.3

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag.

1.24.14

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

  • Make sure spec.cloudControllerManager is not set in the Cluster manifest
  • Set spec.kubernetesVersion to 1.24.x
  • Set spec.cloudConfig.awsEBSCSIDriver.enabled: false
  • kops replace -f manifest.yaml
  • kops update cluster k8s.local --target=terraform

5. What happened after the commands executed?

Error: completed cluster failed validation: spec.externalCloudControllerManager: Forbidden: AWS external CCM cannot be used without enabling spec.cloudConfig.AWSEBSCSIDriver.

6. What did you expect to happen?

The command to execute successfully, since I am not setting spec.cloudControllerManager at all. awsEBSCSIDriver.enabled is set to false because I want to install and manage it outside of kops.

**7. Please provide your cluster manifest.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  creationTimestamp: "2022-03-01T19:30:28Z"
  generation: 1
  name: <redacted>
spec:
  additionalPolicies:
    master: |
      <redacted>
    node: |
      <redacted>
  api:
    loadBalancer:
      class: Classic
      idleTimeoutSeconds: 1800
      type: Public
  authorization:
    rbac: {}
  channel: stable
  cloudConfig:
    awsEBSCSIDriver:
      enabled: false
    manageStorageClasses: false
  cloudProvider: aws
  configBase: <redacted>
  containerRuntime: containerd
  containerd:
    configOverride: |
      version = 2
      [plugins]
        [plugins."io.containerd.grpc.v1.cri"]
          [plugins."io.containerd.grpc.v1.cri".containerd]
            [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
              [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
                runtime_type = "io.containerd.runc.v2"
                [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
                  SystemdCgroup = true
  dnsZone: <redacted>
  etcdClusters:
  - cpuRequest: 200m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-eu-west-1a
      name: a
    - encryptedVolume: true
      instanceGroup: master-eu-west-1b
      name: b
    - encryptedVolume: true
      instanceGroup: master-eu-west-1c
      name: c
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8081
      - name: ETCD_METRICS
        value: extended
      - name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
        value: 1d
      - name: ETCD_MANAGER_DAILY_BACKUPS_RETENTION
        value: 30d
    memoryRequest: 100Mi
    name: main
  - cpuRequest: 100m
    etcdMembers:
    - encryptedVolume: true
      instanceGroup: master-eu-west-1a
      name: a
    - encryptedVolume: true
      instanceGroup: master-eu-west-1b
      name: b
    - encryptedVolume: true
      instanceGroup: master-eu-west-1c
      name: c
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8082
      - name: ETCD_METRICS
        value: extended
      - name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
        value: 1d
      - name: ETCD_MANAGER_DAILY_BACKUPS_RETENTION
        value: 7d
    memoryRequest: 100Mi
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
    serviceAccountExternalPermissions:
    - aws:
        policyARNs:
        - <redacted>
      name: cert-manager
      namespace: cert-manager
    - aws:
        policyARNs:
        - <redacted>
      name: cluster-autoscaler
      namespace: kube-system
    - aws:
        policyARNs:
        - <redacted>
      name: external-dns
      namespace: infra
    useServiceAccountExternalPermissions: true
  kubeAPIServer:
    auditLogMaxAge: 5
    auditLogMaxBackups: 1
    auditLogMaxSize: 100
    auditLogPath: /var/log/kube-apiserver-audit.log
    auditPolicyFile: /srv/kubernetes/kube-apiserver/audit.conf
    defaultNotReadyTolerationSeconds: 150
    defaultUnreachableTolerationSeconds: 150
    disableBasicAuth: true
    eventTTL: 6h0m0s
    logFormat: json
  kubeControllerManager:
    featureGates:
      CSIMigrationAWS: "true"
    horizontalPodAutoscalerDownscaleDelay: 3m0s
    horizontalPodAutoscalerSyncPeriod: 15s
    horizontalPodAutoscalerUpscaleDelay: 3m0s
    logFormat: json
  kubeDNS:
    nodeLocalDNS:
      enabled: true
    provider: CoreDNS
  kubeProxy:
    metricsBindAddress: 0.0.0.0
  kubeScheduler:
    logFormat: json
    usePolicyConfigMap: true
  kubelet:
    anonymousAuth: false
    cgroupDriver: systemd
    featureGates:
      CSIMigrationAWS: "true"
    logFormat: json
  kubernetesApiAccess:
  - <redacted>
  kubernetesVersion: 1.24.14
  masterInternalName: <redacted>
  masterPublicName: <redacted>
  networkCIDR: 10.252.0.0/17
  networking:
    canal: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  ntp:
    managed: false
  serviceAccountIssuerDiscovery:
    discoveryStore: <redacted>
    enableAWSOIDCProvider: true
  sshAccess:
  - <redacted>
  subnets:
  - cidr: 10.252.16.0/20
    name: eu-west-1a
    type: Private
    zone: eu-west-1a
  - cidr: 10.252.32.0/20
    name: eu-west-1b
    type: Private
    zone: eu-west-1b
  - cidr: 10.252.48.0/20
    name: eu-west-1c
    type: Private
    zone: eu-west-1c
  - cidr: 10.252.0.0/23
    name: utility-eu-west-1a
    type: Utility
    zone: eu-west-1a
  - cidr: 10.252.2.0/23
    name: utility-eu-west-1b
    type: Utility
    zone: eu-west-1b
  - cidr: 10.252.4.0/23
    name: utility-eu-west-1c
    type: Utility
    zone: eu-west-1c
  topology:
    bastion:
      bastionPublicName: <redacted>
      idleTimeoutSeconds: 1800
    dns:
      type: Public
    masters: private
    nodes: private

flopib avatar Jun 01 '23 14:06 flopib

This was a design decision, for sure not a bug. @olemarkus do you remember why we chose this behaviour?

hakman avatar Jun 02 '23 08:06 hakman

EBS CSI driver didn't support manual install in 1.24. It was added in 1.25 though: https://kops.sigs.k8s.io/addons/#self-managed-aws-ebs-csi-driver

olemarkus avatar Jun 06 '23 19:06 olemarkus

Right, the addition of support for self-managed EBS CSI driver solves the issue indeed, thanks!

Of course, I still get the same error if awsEBSCSIDriver.enabled is left set to false, although that configuration makes less sense in my case now that there is a separate managed parameter.

The error message also misled me into thinking that the external CCM was an opt-in setup controlled by the presence (or absence) of spec.cloudControllerManager in the cluster manifest.

flopib avatar Jun 07 '23 20:06 flopib

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 22 '24 02:01 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Feb 21 '24 02:02 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Mar 22 '24 03:03 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 22 '24 03:03 k8s-ci-robot