cluster-api-provider-azure
cluster-api-provider-azure copied to clipboard
cloud-node-manager-windows in Crashloopbackoff
/kind bug
[Before submitting an issue, have you checked the Troubleshooting Guide?]
What steps did you take and what happened: Deployed the Azure Cloud Provider helm install --repo https://raw.githubusercontent.com/kubernetes-sigs/cloud-provider-azure/master/helm/repo cloud-provider-azure --generate-name --set infra.clusterName=${CLUSTER_NAME}
cloud-node-manager-windows-n4hks 0/1 CrashLoopBackOff 38 (3h8m ago)
What did you expect to happen:
cloud-node-manager-windows runs without CrashLoopBackOff
Anything else you would like to add:
Here is the description of POD and logs
Name: cloud-node-manager-windows-n4hks
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: win-p-win000000/10.1.0.6
Start Time: Wed, 01 Jun 2022 10:22:41 -0400
Labels: controller-revision-hash=5dd46f6bfb
k8s-app=cloud-node-manager-windows
pod-template-generation=1
Annotations: cluster-autoscaler.kubernetes.io/daemonset-pod: true
cni.projectcalico.org/containerID: 8bf61dbeba6c135c9de54edfbf422ffc5fee6a353502c941600f7727cd0f9414
cni.projectcalico.org/podIP: 192.168.152.140/32
cni.projectcalico.org/podIPs: 192.168.152.140/32
Status: Running
IP: 192.168.152.140
IPs:
IP: 192.168.152.140
Controlled By: DaemonSet/cloud-node-manager-windows
Containers:
cloud-node-manager:
Container ID: containerd://233de6e3a448a37782bede5e244b325fb46a18862857260c7ddb79805641b47b
Image: mcr.microsoft.com/oss/kubernetes/azure-cloud-node-manager:v1.23.11
Image ID: mcr.microsoft.com/oss/kubernetes/azure-cloud-node-manager@sha256:075ea1f8270312350f1396ab6677251e803e61a523822d5abfa5e6acd180cfab
Port:
Warning BackOff 3h12m (x545 over 5h16m) kubelet Back-off restarting failed container root@CAPZ-Management:/home/bmadministrator/.kube# kubectl --kubeconfig=config logs cloud-node-manager-windows-n4hks -n kube-system
Failed to wait for apiserver being healthy: timed out waiting for the condition: failed to get apiserver /healthz status: Get "https://10.96.0.1:443/healthz": dial tcp 10.96.0.1:443: connectex: A socket operation was attempted to an unreachable network.
Environment:
- cluster-api-provider-azure version: cluster.x-k8s.io/v1beta1 1.3.1
- Kubernetes version: (use
kubectl version): 1.23.6 - OS (e.g. from
/etc/os-release): Control Plan Ubuntu Jammy Jellyfish but the node OS is Windows server 2019 cncf image
https://github.com/kubernetes-sigs/cloud-provider-azure/issues/1807
kubernetes-sigs/cloud-provider-azure is regularly testing out-of-tree w/ both Linux and Windows.
@lzhecheng @nilo19 are you regularly seeing any of the above symptoms in any tests?
Is it because of Calico CNI ?
@jackfrancis Actually, those windows tests are still blocked by calico installation failure now... We will check if such situations happen after your "helm install calico" PR is merged. Besides, so far we are not aware of such situation.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.