kops cloud-controller-manager

I am using Kops on GCE cluster.

The recent cluster update automatically changed cloud-controller-manager image

Change log:
ManagedFile/cluster.k8s.local-addons-gcp-cloud-controller.addons.k8s.io-k8s-1.23
        Contents
                                  name: KUBERNETES_SERVICE_HOST
                                            value: 127.0.0.1
                                +         image: gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:master@sha256:b3ac9d2d9cff8d736473ab0297c57dfb1924b50758e5cc75a80bacd9d6568f8a
                                -         image: gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:master@sha256:f575cc54d0ac3abf0c4c6e8306d6d809424e237e51f4a9f74575502be71c607c
                                          imagePullPolicy: IfNotPresent
                                          livenessProbe:

Because of this newly updated image gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:master@sha256:b3ac9d2d9cff8d736473ab0297c57dfb1924b50758e5cc75a80bacd9d6568f8a cloud-controller-manager pod is crashed.

Log message in pod : flag provided but not defined: -allocate-node-cidrs Usage of /go-runner: -also-stdout useful with log-file, log to standard output as well as the log file -log-file string If non-empty, save stdout to this file -redirect-stderr treat stderr same as stdout (default true)

on checking the docker image docker run --rm gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:master@sha256:b3ac9d2d9cff8d736473ab0297c57dfb1924b50758e5cc75a80bacd9d6568f8a --help

did not list any of the below attributes which are in the cloud-controller-manager deamonset

args: - --allocate-node-cidrs=true - --cidr-allocator-type=CloudAllocator - --cluster-cidr=************ - --cluster-name=************* - --controllers=* - --leader-elect=true - --v=2 - --cloud-provider=gce - --use-service-account-credentials=true - --cloud-config=/etc/kubernetes/cloud.config

Please let us know how to fix this issue. How to avoid this automatic version updates of images. This new image is breaking the cluster.

Apr 08 '25 11:04 RizwanaVyoma

I am encountering this issue too with the following initial output pointing to a problem with the cloud-controller-manager as reported by @RizwanaVyoma :

$ kops version
Client version: 1.31.0

$ kops validate cluster --wait 15m
...

Validation Failed
W0408 17:05:03.705924 1052446 validate_cluster.go:230] (will retry): cluster not yet healthy
I0408 17:05:14.154048 1052446 gce_cloud.go:307] Scanning zones: [us-east1-b us-east1-c us-east1-d]
INSTANCE GROUPS
NAME                            ROLE            MACHINETYPE     MIN     MAX     SUBNETS
control-plane-us-east1-b        ControlPlane    n1-standard-4   1       1       us-east1
nodes-us-east1-b                Node            n2-standard-8   2       2       us-east1

NODE STATUS
NAME    ROLE    READY

VALIDATION ERRORS
KIND    NAME                                                                                                                                    MESSAGE
Machine https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/control-plane-us-east1-b-mxhm        machine "https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/control-plane-us-east1-b-mxhm" has not yet joined cluster
Machine https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/nodes-us-east1-b-h9tk                machine "https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/nodes-us-east1-b-h9tk" has not yet joined cluster
Machine https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/nodes-us-east1-b-xt6w                machine "https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/nodes-us-east1-b-xt6w" has not yet joined cluster
Pod     kube-system/cloud-controller-manager-pzzk9                                                                                              system-cluster-critical pod "cloud-controller-manager-pzzk9" is not ready (cloud-controller-manager)
Pod     kube-system/coredns-autoscaler-56467f9769-ltzwk                                                                                         system-cluster-critical pod "coredns-autoscaler-56467f9769-ltzwk" is pending
Pod     kube-system/coredns-db7b68989-59cw7                                                                                                     system-cluster-critical pod "coredns-db7b68989-59cw7" is pending

Validation Failed
W0408 17:05:14.944295 1052446 validate_cluster.go:230] (will retry): cluster not yet healthy
Error: validation failed: wait time exceeded during validation

Apr 08 '25 17:04 nevdullcode

Can you try setting this in the cluster spec and run kops update cluster --yes to see if that fixes the issue?

spec:
  cloudControllerManager:
    image: gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:v32.2.4

Apr 09 '25 00:04 rifelpet

@rifelpet Yes, that works. Cluster validation now completes successfully. Thank you!

Apr 09 '25 15:04 nevdullcode

Upstream was also fixed in https://github.com/kubernetes/cloud-provider-gcp/pull/842

Apr 27 '25 05:04 hakman

Confirming the workaround suggested by @rifelpet is no longer required. Thanks, all!

May 12 '25 15:05 nevdullcode

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Aug 10 '25 15:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Sep 09 '25 15:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Oct 09 '25 16:10 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Oct 09 '25 16:10 k8s-ci-robot

cloud-controller-manager - CrashLoopBackOff