Cannot run kops update cluster on AWS with no spec.cloudControllerManager set when AWSEBSCSIDriver is not managed by kops
/kind bug
1. What kops version are you running? The command kops version, will display
this information.
1.24.5
Update: issue also happens with 1.26.3
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
1.24.14
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
- Make sure
spec.cloudControllerManageris not set in theClustermanifest - Set
spec.kubernetesVersionto1.24.x - Set
spec.cloudConfig.awsEBSCSIDriver.enabled: false kops replace -f manifest.yamlkops update cluster k8s.local --target=terraform
5. What happened after the commands executed?
Error: completed cluster failed validation: spec.externalCloudControllerManager: Forbidden: AWS external CCM cannot be used without enabling spec.cloudConfig.AWSEBSCSIDriver.
6. What did you expect to happen?
The command to execute successfully, since I am not setting spec.cloudControllerManager at all. awsEBSCSIDriver.enabled is set to false because I want to install and manage it outside of kops.
**7. Please provide your cluster manifest.
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
creationTimestamp: "2022-03-01T19:30:28Z"
generation: 1
name: <redacted>
spec:
additionalPolicies:
master: |
<redacted>
node: |
<redacted>
api:
loadBalancer:
class: Classic
idleTimeoutSeconds: 1800
type: Public
authorization:
rbac: {}
channel: stable
cloudConfig:
awsEBSCSIDriver:
enabled: false
manageStorageClasses: false
cloudProvider: aws
configBase: <redacted>
containerRuntime: containerd
containerd:
configOverride: |
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
dnsZone: <redacted>
etcdClusters:
- cpuRequest: 200m
etcdMembers:
- encryptedVolume: true
instanceGroup: master-eu-west-1a
name: a
- encryptedVolume: true
instanceGroup: master-eu-west-1b
name: b
- encryptedVolume: true
instanceGroup: master-eu-west-1c
name: c
manager:
env:
- name: ETCD_LISTEN_METRICS_URLS
value: http://0.0.0.0:8081
- name: ETCD_METRICS
value: extended
- name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
value: 1d
- name: ETCD_MANAGER_DAILY_BACKUPS_RETENTION
value: 30d
memoryRequest: 100Mi
name: main
- cpuRequest: 100m
etcdMembers:
- encryptedVolume: true
instanceGroup: master-eu-west-1a
name: a
- encryptedVolume: true
instanceGroup: master-eu-west-1b
name: b
- encryptedVolume: true
instanceGroup: master-eu-west-1c
name: c
manager:
env:
- name: ETCD_LISTEN_METRICS_URLS
value: http://0.0.0.0:8082
- name: ETCD_METRICS
value: extended
- name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
value: 1d
- name: ETCD_MANAGER_DAILY_BACKUPS_RETENTION
value: 7d
memoryRequest: 100Mi
name: events
iam:
allowContainerRegistry: true
legacy: false
serviceAccountExternalPermissions:
- aws:
policyARNs:
- <redacted>
name: cert-manager
namespace: cert-manager
- aws:
policyARNs:
- <redacted>
name: cluster-autoscaler
namespace: kube-system
- aws:
policyARNs:
- <redacted>
name: external-dns
namespace: infra
useServiceAccountExternalPermissions: true
kubeAPIServer:
auditLogMaxAge: 5
auditLogMaxBackups: 1
auditLogMaxSize: 100
auditLogPath: /var/log/kube-apiserver-audit.log
auditPolicyFile: /srv/kubernetes/kube-apiserver/audit.conf
defaultNotReadyTolerationSeconds: 150
defaultUnreachableTolerationSeconds: 150
disableBasicAuth: true
eventTTL: 6h0m0s
logFormat: json
kubeControllerManager:
featureGates:
CSIMigrationAWS: "true"
horizontalPodAutoscalerDownscaleDelay: 3m0s
horizontalPodAutoscalerSyncPeriod: 15s
horizontalPodAutoscalerUpscaleDelay: 3m0s
logFormat: json
kubeDNS:
nodeLocalDNS:
enabled: true
provider: CoreDNS
kubeProxy:
metricsBindAddress: 0.0.0.0
kubeScheduler:
logFormat: json
usePolicyConfigMap: true
kubelet:
anonymousAuth: false
cgroupDriver: systemd
featureGates:
CSIMigrationAWS: "true"
logFormat: json
kubernetesApiAccess:
- <redacted>
kubernetesVersion: 1.24.14
masterInternalName: <redacted>
masterPublicName: <redacted>
networkCIDR: 10.252.0.0/17
networking:
canal: {}
nonMasqueradeCIDR: 100.64.0.0/10
ntp:
managed: false
serviceAccountIssuerDiscovery:
discoveryStore: <redacted>
enableAWSOIDCProvider: true
sshAccess:
- <redacted>
subnets:
- cidr: 10.252.16.0/20
name: eu-west-1a
type: Private
zone: eu-west-1a
- cidr: 10.252.32.0/20
name: eu-west-1b
type: Private
zone: eu-west-1b
- cidr: 10.252.48.0/20
name: eu-west-1c
type: Private
zone: eu-west-1c
- cidr: 10.252.0.0/23
name: utility-eu-west-1a
type: Utility
zone: eu-west-1a
- cidr: 10.252.2.0/23
name: utility-eu-west-1b
type: Utility
zone: eu-west-1b
- cidr: 10.252.4.0/23
name: utility-eu-west-1c
type: Utility
zone: eu-west-1c
topology:
bastion:
bastionPublicName: <redacted>
idleTimeoutSeconds: 1800
dns:
type: Public
masters: private
nodes: private
This was a design decision, for sure not a bug. @olemarkus do you remember why we chose this behaviour?
EBS CSI driver didn't support manual install in 1.24. It was added in 1.25 though: https://kops.sigs.k8s.io/addons/#self-managed-aws-ebs-csi-driver
Right, the addition of support for self-managed EBS CSI driver solves the issue indeed, thanks!
Of course, I still get the same error if awsEBSCSIDriver.enabled is left set to false, although that configuration makes less sense in my case now that there is a separate managed parameter.
The error message also misled me into thinking that the external CCM was an opt-in setup controlled by the presence (or absence) of spec.cloudControllerManager in the cluster manifest.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.