sig-windows-tools Calico not working on Windows following guide- "host must be a URL or a host:port pair: "https://"

I'm tryign to set up a simple Kubernetes cluster for testing, using my laptops HyperV as the base for the control as well as some nodes. The goal is to have Kubernetes cluster with both Linux and Windows nodes, and with the Windows nodes supporting gMSA.

I'm currently stuck getting the Windows node to handle a sample Windows pod.

I've followed the tutorial from https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/guides/calico.md to get Calico installed. Note that I'm using Calico version 3.26.1 with Kubernetes version 1.29.0. Target Windows OS is Windows Server 2019.

At a glance Calico seemed to have been installed correctly as all system and Calico nodes are running:

mbender@kube-control:~$ kubectl get nodes -o=wide
NAME           STATUS   ROLES           AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                       KERNEL-VERSION      CONTAINER-RUNTIME
kube-control   Ready    control-plane   20m   v1.29.0   10.150.4.104   <none>        Ubuntu 22.04.3 LTS             5.15.0-91-generic   containerd://1.7.2
kube-node2     Ready    <none>          16m   v1.29.0   10.150.4.32    <none>        Windows Server 2019 Standard   10.0.17763.5206     containerd://1.7.2
mbender@kube-control:~$ kubectl get pods --all-namespaces
NAMESPACE          NAME                                       READY   STATUS              RESTARTS   AGE
calico-apiserver   calico-apiserver-548b68f758-6tbnz          1/1     Running             0          19m
calico-apiserver   calico-apiserver-548b68f758-bbwm4          1/1     Running             0          19m
calico-system      calico-kube-controllers-6ddd76dbf7-lbn4d   1/1     Running             0          19m
calico-system      calico-node-57g26                          1/1     Running             0          10m
calico-system      calico-typha-6f4959d889-fvbkg              1/1     Running             0          19m
calico-system      csi-node-driver-js7dw                      2/2     Running             0          19m
default            windows-76cb69dfd7-gwczp                   0/1     ContainerCreating   0          13m
kube-system        calico-node-windows-nzmlf                  2/2     Running             0          15m
kube-system        coredns-76f75df574-jnknv                   1/1     Running             0          20m
kube-system        coredns-76f75df574-lknvb                   1/1     Running             0          20m
kube-system        etcd-kube-control                          1/1     Running             0          20m
kube-system        kube-apiserver-kube-control                1/1     Running             0          20m
kube-system        kube-controller-manager-kube-control       1/1     Running             0          20m
kube-system        kube-proxy-windows-4xffg                   1/1     Running             0          15m
kube-system        kube-proxy-wwlzh                           1/1     Running             0          20m
kube-system        kube-scheduler-kube-control                1/1     Running             0          20m
tigera-operator    tigera-operator-94d7f7696-vnrvr            1/1     Running             0          19m

The list above contains a simple Windows pod which is stuck on ContainerCreating.

mbender@kube-control:~$ kubectl describe pod windows-76cb69dfd7-gwczp
Name:             windows-76cb69dfd7-gwczp
Namespace:        default
Priority:         0
Service Account:  default
Node:             kube-node2/10.150.4.32
Start Time:       Fri, 29 Dec 2023 11:17:06 +0000
Labels:           pod-template-hash=76cb69dfd7
                  run=windows
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    ReplicaSet/windows-76cb69dfd7
Containers:
  iis:
    Container ID:
    Image:          mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fzh56 (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   False
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  kube-api-access-fzh56:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=windows
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                  From               Message
  ----     ------                  ----                 ----               -------
  Normal   Scheduled               14m                  default-scheduler  Successfully assigned default/windows-76cb69dfd7-gwczp to kube-node2
  Warning  FailedCreatePodSandBox  14m                  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "547401c836fa05682aad8463750a404021439effc8357898f9e50c32d93f0802": plugin type="calico" failed (add): error creating calico client: host must be a URL or a host:port pair: "https://"
  Normal   SandboxChanged          4m2s (x47 over 14m)  kubelet            Pod sandbox changed, it will be killed and re-created.

For reference, the Windows pod definition I'm using:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    run: windows
  name: windows
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      run: windows
  template:
    metadata:
      labels:
        run: windows
    spec:
      containers:
      - image: mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019
        imagePullPolicy: Always
        name: iis
      nodeSelector:
        kubernetes.io/os: windows

To Reproduce

Host node is a Ubuntu Server 22.04 installed on HyperV, configured following guide https://iamunnip.hashnode.dev/building-a-kubernetes-v129-cluster-using-kubeadm with the exception of Calico install.
Windows node is a Windows Server 2019
Follow guide https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/guides/guide-for-adding-windows-node.md
Follow guide https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/guides/calico.md
Attempt to start a Windows pod

Expected behavior Windows pod should start

Kubernetes:

Windows Server version: 2019
Kubernetes Version: 1.29.0
CNI: Calico 3.26.1

Dec 29 '23 11:12 arp-mbender

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Mar 28 '24 12:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Apr 27 '24 12:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

May 27 '24 12:05 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

May 27 '24 12:05 k8s-ci-robot

On the control plane during the "kubeadm init" call, include the --control-plane-endpoint. So kubeadm init --control-plane-endpoint "[control plane ip]:6443 ~ This will ensure the server:port address is set properly in the windows node
If pods aren't getting created in the windows node, it may be due to permission issues. (There's a bunch of missing resources and verbs required by the calico node in the ClusterRole created for the windows node) https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/hostprocess/calico/kube-calico-rbac.yml.
Easy fix is to just create a new role binding and make the calico-node user a cluster admin:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: calico-node-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: calico-node
    namespace: kube-system

Actual fix for the cluster role is as follows:

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: calico-node-windows
rules:
  - apiGroups: [""]
    resources:
      - namespaces
      - serviceaccounts
    verbs:
      - get
      - list
      - watch
  - apiGroups: [""]
    resources:
      - pods/status
    verbs:
      - patch
  - apiGroups: [""]
    resources:
      - pods
      - services
      - configmaps
    verbs:
      - get
      - list
      - watch
  - apiGroups: [""]
    resources:
      - endpoints
    verbs:
      - get
  - apiGroups: [""]
    resources:
      - nodes
      - nodes/status
    verbs:
      - get
      - list
      - update
      - watch
      - patch
  - apiGroups: ["extensions"]
    resources:
      - networkpolicies
    verbs:
      - get
      - list
      - watch
  - apiGroups: ["networking.k8s.io"]
    resources:
      - networkpolicies
    verbs:
      - watch
      - list
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - felixconfigurations
      - bgppeers
      - bgpconfigurations
      - ippools
      - globalnetworkpolicies
      - globalnetworksets
      - networkpolicies
      - clusterinformations
      - hostendpoints
      - ipreservations
      - ipamblocks
      - ipamconfigs
      - blockaffinities
      - networksets
      - ipamhandles
    verbs:
      - create
      - get
      - list
      - update
      - watch
  - apiGroups: ["discovery.k8s.io"]
    resources:
      - endpointslices
    verbs:
      - get
      - list
      - watch

Also note that here, I had to also change the cluster role name to "calico-node-windows" since it conflicts with the other "calico-node" cluster role and cluster role binding, so the cluster role binding also needs to be changed to:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: calico-node-windows
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: calico-node-windows
subjects:
  - kind: ServiceAccount
    name: calico-node
    namespace: kube-system

If the windows pods just can't communicate with other pods, one possible fix is to update https://raw.githubusercontent.com/projectcalico/calico/$CALICO_VERSION/manifests/custom-resources.yaml and change the ipPools/encapsulation from VXLANCrossSubnet to just VXLAN

Jun 06 '24 19:06 peer-qvannatter

sig-windows-tools sig-windows-tools copied to clipboard

Calico not working on Windows following guide- "host must be a URL or a host:port pair: "https://"

sig-windows-tools
sig-windows-tools copied to clipboard