sig-windows-tools
sig-windows-tools copied to clipboard
Calico not working on Windows following guide- "host must be a URL or a host:port pair: "https://"
I'm tryign to set up a simple Kubernetes cluster for testing, using my laptops HyperV as the base for the control as well as some nodes. The goal is to have Kubernetes cluster with both Linux and Windows nodes, and with the Windows nodes supporting gMSA.
I'm currently stuck getting the Windows node to handle a sample Windows pod.
I've followed the tutorial from https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/guides/calico.md to get Calico installed. Note that I'm using Calico version 3.26.1 with Kubernetes version 1.29.0. Target Windows OS is Windows Server 2019.
At a glance Calico seemed to have been installed correctly as all system and Calico nodes are running:
mbender@kube-control:~$ kubectl get nodes -o=wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kube-control Ready control-plane 20m v1.29.0 10.150.4.104 <none> Ubuntu 22.04.3 LTS 5.15.0-91-generic containerd://1.7.2
kube-node2 Ready <none> 16m v1.29.0 10.150.4.32 <none> Windows Server 2019 Standard 10.0.17763.5206 containerd://1.7.2
mbender@kube-control:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
calico-apiserver calico-apiserver-548b68f758-6tbnz 1/1 Running 0 19m
calico-apiserver calico-apiserver-548b68f758-bbwm4 1/1 Running 0 19m
calico-system calico-kube-controllers-6ddd76dbf7-lbn4d 1/1 Running 0 19m
calico-system calico-node-57g26 1/1 Running 0 10m
calico-system calico-typha-6f4959d889-fvbkg 1/1 Running 0 19m
calico-system csi-node-driver-js7dw 2/2 Running 0 19m
default windows-76cb69dfd7-gwczp 0/1 ContainerCreating 0 13m
kube-system calico-node-windows-nzmlf 2/2 Running 0 15m
kube-system coredns-76f75df574-jnknv 1/1 Running 0 20m
kube-system coredns-76f75df574-lknvb 1/1 Running 0 20m
kube-system etcd-kube-control 1/1 Running 0 20m
kube-system kube-apiserver-kube-control 1/1 Running 0 20m
kube-system kube-controller-manager-kube-control 1/1 Running 0 20m
kube-system kube-proxy-windows-4xffg 1/1 Running 0 15m
kube-system kube-proxy-wwlzh 1/1 Running 0 20m
kube-system kube-scheduler-kube-control 1/1 Running 0 20m
tigera-operator tigera-operator-94d7f7696-vnrvr 1/1 Running 0 19m
The list above contains a simple Windows pod which is stuck on ContainerCreating
.
mbender@kube-control:~$ kubectl describe pod windows-76cb69dfd7-gwczp
Name: windows-76cb69dfd7-gwczp
Namespace: default
Priority: 0
Service Account: default
Node: kube-node2/10.150.4.32
Start Time: Fri, 29 Dec 2023 11:17:06 +0000
Labels: pod-template-hash=76cb69dfd7
run=windows
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/windows-76cb69dfd7
Containers:
iis:
Container ID:
Image: mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fzh56 (ro)
Conditions:
Type Status
PodReadyToStartContainers False
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-fzh56:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=windows
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned default/windows-76cb69dfd7-gwczp to kube-node2
Warning FailedCreatePodSandBox 14m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "547401c836fa05682aad8463750a404021439effc8357898f9e50c32d93f0802": plugin type="calico" failed (add): error creating calico client: host must be a URL or a host:port pair: "https://"
Normal SandboxChanged 4m2s (x47 over 14m) kubelet Pod sandbox changed, it will be killed and re-created.
For reference, the Windows pod definition I'm using:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
run: windows
name: windows
namespace: default
spec:
replicas: 1
selector:
matchLabels:
run: windows
template:
metadata:
labels:
run: windows
spec:
containers:
- image: mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019
imagePullPolicy: Always
name: iis
nodeSelector:
kubernetes.io/os: windows
To Reproduce
- Host node is a Ubuntu Server 22.04 installed on HyperV, configured following guide https://iamunnip.hashnode.dev/building-a-kubernetes-v129-cluster-using-kubeadm with the exception of Calico install.
- Windows node is a Windows Server 2019
- Follow guide https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/guides/guide-for-adding-windows-node.md
- Follow guide https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/guides/calico.md
- Attempt to start a Windows pod
Expected behavior Windows pod should start
Kubernetes:
- Windows Server version: 2019
- Kubernetes Version: 1.29.0
- CNI: Calico 3.26.1
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied- After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied- After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closedYou can:
- Reopen this issue with
/reopen
- Mark this issue as fresh with
/remove-lifecycle rotten
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
- On the control plane during the "kubeadm init" call, include the --control-plane-endpoint. So
kubeadm init --control-plane-endpoint "[control plane ip]:6443
~ This will ensure the server:port address is set properly in the windows node - If pods aren't getting created in the windows node, it may be due to permission issues. (There's a bunch of missing resources and verbs required by the calico node in the ClusterRole created for the windows node) https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/hostprocess/calico/kube-calico-rbac.yml.
- Easy fix is to just create a new role binding and make the calico-node user a cluster admin:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: calico-node-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: calico-node
namespace: kube-system
- Actual fix for the cluster role is as follows:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: calico-node-windows
rules:
- apiGroups: [""]
resources:
- namespaces
- serviceaccounts
verbs:
- get
- list
- watch
- apiGroups: [""]
resources:
- pods/status
verbs:
- patch
- apiGroups: [""]
resources:
- pods
- services
- configmaps
verbs:
- get
- list
- watch
- apiGroups: [""]
resources:
- endpoints
verbs:
- get
- apiGroups: [""]
resources:
- nodes
- nodes/status
verbs:
- get
- list
- update
- watch
- patch
- apiGroups: ["extensions"]
resources:
- networkpolicies
verbs:
- get
- list
- watch
- apiGroups: ["networking.k8s.io"]
resources:
- networkpolicies
verbs:
- watch
- list
- apiGroups: ["crd.projectcalico.org"]
resources:
- felixconfigurations
- bgppeers
- bgpconfigurations
- ippools
- globalnetworkpolicies
- globalnetworksets
- networkpolicies
- clusterinformations
- hostendpoints
- ipreservations
- ipamblocks
- ipamconfigs
- blockaffinities
- networksets
- ipamhandles
verbs:
- create
- get
- list
- update
- watch
- apiGroups: ["discovery.k8s.io"]
resources:
- endpointslices
verbs:
- get
- list
- watch
- Also note that here, I had to also change the cluster role name to "calico-node-windows" since it conflicts with the other "calico-node" cluster role and cluster role binding, so the cluster role binding also needs to be changed to:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: calico-node-windows
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: calico-node-windows
subjects:
- kind: ServiceAccount
name: calico-node
namespace: kube-system
- If the windows pods just can't communicate with other pods, one possible fix is to update https://raw.githubusercontent.com/projectcalico/calico/$CALICO_VERSION/manifests/custom-resources.yaml and change the ipPools/encapsulation from
VXLANCrossSubnet
to justVXLAN