cloud-controller-manager - CrashLoopBackOff
I am using Kops on GCE cluster.
The recent cluster update automatically changed cloud-controller-manager image
Change log:
ManagedFile/cluster.k8s.local-addons-gcp-cloud-controller.addons.k8s.io-k8s-1.23
Contents
name: KUBERNETES_SERVICE_HOST
value: 127.0.0.1
+ image: gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:master@sha256:b3ac9d2d9cff8d736473ab0297c57dfb1924b50758e5cc75a80bacd9d6568f8a
- image: gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:master@sha256:f575cc54d0ac3abf0c4c6e8306d6d809424e237e51f4a9f74575502be71c607c
imagePullPolicy: IfNotPresent
livenessProbe:
Because of this newly updated image gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:master@sha256:b3ac9d2d9cff8d736473ab0297c57dfb1924b50758e5cc75a80bacd9d6568f8a cloud-controller-manager pod is crashed.
Log message in pod : flag provided but not defined: -allocate-node-cidrs Usage of /go-runner: -also-stdout useful with log-file, log to standard output as well as the log file -log-file string If non-empty, save stdout to this file -redirect-stderr treat stderr same as stdout (default true)
on checking the docker image
docker run --rm gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:master@sha256:b3ac9d2d9cff8d736473ab0297c57dfb1924b50758e5cc75a80bacd9d6568f8a --help
did not list any of the below attributes which are in the cloud-controller-manager deamonset
- args: - --allocate-node-cidrs=true - --cidr-allocator-type=CloudAllocator - --cluster-cidr=************ - --cluster-name=************* - --controllers=* - --leader-elect=true - --v=2 - --cloud-provider=gce - --use-service-account-credentials=true - --cloud-config=/etc/kubernetes/cloud.config
Please let us know how to fix this issue. How to avoid this automatic version updates of images. This new image is breaking the cluster.
I am encountering this issue too with the following initial output pointing to a problem with the cloud-controller-manager as reported by @RizwanaVyoma :
$ kops version
Client version: 1.31.0
$ kops validate cluster --wait 15m
...
Validation Failed
W0408 17:05:03.705924 1052446 validate_cluster.go:230] (will retry): cluster not yet healthy
I0408 17:05:14.154048 1052446 gce_cloud.go:307] Scanning zones: [us-east1-b us-east1-c us-east1-d]
INSTANCE GROUPS
NAME ROLE MACHINETYPE MIN MAX SUBNETS
control-plane-us-east1-b ControlPlane n1-standard-4 1 1 us-east1
nodes-us-east1-b Node n2-standard-8 2 2 us-east1
NODE STATUS
NAME ROLE READY
VALIDATION ERRORS
KIND NAME MESSAGE
Machine https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/control-plane-us-east1-b-mxhm machine "https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/control-plane-us-east1-b-mxhm" has not yet joined cluster
Machine https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/nodes-us-east1-b-h9tk machine "https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/nodes-us-east1-b-h9tk" has not yet joined cluster
Machine https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/nodes-us-east1-b-xt6w machine "https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/nodes-us-east1-b-xt6w" has not yet joined cluster
Pod kube-system/cloud-controller-manager-pzzk9 system-cluster-critical pod "cloud-controller-manager-pzzk9" is not ready (cloud-controller-manager)
Pod kube-system/coredns-autoscaler-56467f9769-ltzwk system-cluster-critical pod "coredns-autoscaler-56467f9769-ltzwk" is pending
Pod kube-system/coredns-db7b68989-59cw7 system-cluster-critical pod "coredns-db7b68989-59cw7" is pending
Validation Failed
W0408 17:05:14.944295 1052446 validate_cluster.go:230] (will retry): cluster not yet healthy
Error: validation failed: wait time exceeded during validation
Can you try setting this in the cluster spec and run kops update cluster --yes to see if that fixes the issue?
spec:
cloudControllerManager:
image: gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:v32.2.4
@rifelpet Yes, that works. Cluster validation now completes successfully. Thank you!
Upstream was also fixed in https://github.com/kubernetes/cloud-provider-gcp/pull/842
Confirming the workaround suggested by @rifelpet is no longer required. Thanks, all!
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.