sig-windows-tools icon indicating copy to clipboard operation
sig-windows-tools copied to clipboard

Calico not working on Windows following guide- "host must be a URL or a host:port pair: "https://"

Open arp-mbender opened this issue 1 year ago • 2 comments

I'm tryign to set up a simple Kubernetes cluster for testing, using my laptops HyperV as the base for the control as well as some nodes. The goal is to have Kubernetes cluster with both Linux and Windows nodes, and with the Windows nodes supporting gMSA.

I'm currently stuck getting the Windows node to handle a sample Windows pod.

I've followed the tutorial from https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/guides/calico.md to get Calico installed. Note that I'm using Calico version 3.26.1 with Kubernetes version 1.29.0. Target Windows OS is Windows Server 2019.

At a glance Calico seemed to have been installed correctly as all system and Calico nodes are running:

mbender@kube-control:~$ kubectl get nodes -o=wide
NAME           STATUS   ROLES           AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                       KERNEL-VERSION      CONTAINER-RUNTIME
kube-control   Ready    control-plane   20m   v1.29.0   10.150.4.104   <none>        Ubuntu 22.04.3 LTS             5.15.0-91-generic   containerd://1.7.2
kube-node2     Ready    <none>          16m   v1.29.0   10.150.4.32    <none>        Windows Server 2019 Standard   10.0.17763.5206     containerd://1.7.2
mbender@kube-control:~$ kubectl get pods --all-namespaces
NAMESPACE          NAME                                       READY   STATUS              RESTARTS   AGE
calico-apiserver   calico-apiserver-548b68f758-6tbnz          1/1     Running             0          19m
calico-apiserver   calico-apiserver-548b68f758-bbwm4          1/1     Running             0          19m
calico-system      calico-kube-controllers-6ddd76dbf7-lbn4d   1/1     Running             0          19m
calico-system      calico-node-57g26                          1/1     Running             0          10m
calico-system      calico-typha-6f4959d889-fvbkg              1/1     Running             0          19m
calico-system      csi-node-driver-js7dw                      2/2     Running             0          19m
default            windows-76cb69dfd7-gwczp                   0/1     ContainerCreating   0          13m
kube-system        calico-node-windows-nzmlf                  2/2     Running             0          15m
kube-system        coredns-76f75df574-jnknv                   1/1     Running             0          20m
kube-system        coredns-76f75df574-lknvb                   1/1     Running             0          20m
kube-system        etcd-kube-control                          1/1     Running             0          20m
kube-system        kube-apiserver-kube-control                1/1     Running             0          20m
kube-system        kube-controller-manager-kube-control       1/1     Running             0          20m
kube-system        kube-proxy-windows-4xffg                   1/1     Running             0          15m
kube-system        kube-proxy-wwlzh                           1/1     Running             0          20m
kube-system        kube-scheduler-kube-control                1/1     Running             0          20m
tigera-operator    tigera-operator-94d7f7696-vnrvr            1/1     Running             0          19m

The list above contains a simple Windows pod which is stuck on ContainerCreating.

mbender@kube-control:~$ kubectl describe pod windows-76cb69dfd7-gwczp
Name:             windows-76cb69dfd7-gwczp
Namespace:        default
Priority:         0
Service Account:  default
Node:             kube-node2/10.150.4.32
Start Time:       Fri, 29 Dec 2023 11:17:06 +0000
Labels:           pod-template-hash=76cb69dfd7
                  run=windows
Annotations:      <none>
Status:           Pending
IP:
IPs:              <none>
Controlled By:    ReplicaSet/windows-76cb69dfd7
Containers:
  iis:
    Container ID:
    Image:          mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fzh56 (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   False
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  kube-api-access-fzh56:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=windows
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                  From               Message
  ----     ------                  ----                 ----               -------
  Normal   Scheduled               14m                  default-scheduler  Successfully assigned default/windows-76cb69dfd7-gwczp to kube-node2
  Warning  FailedCreatePodSandBox  14m                  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "547401c836fa05682aad8463750a404021439effc8357898f9e50c32d93f0802": plugin type="calico" failed (add): error creating calico client: host must be a URL or a host:port pair: "https://"
  Normal   SandboxChanged          4m2s (x47 over 14m)  kubelet            Pod sandbox changed, it will be killed and re-created.

For reference, the Windows pod definition I'm using:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    run: windows
  name: windows
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      run: windows
  template:
    metadata:
      labels:
        run: windows
    spec:
      containers:
      - image: mcr.microsoft.com/windows/servercore/iis:windowsservercore-ltsc2019
        imagePullPolicy: Always
        name: iis
      nodeSelector:
        kubernetes.io/os: windows

To Reproduce

  1. Host node is a Ubuntu Server 22.04 installed on HyperV, configured following guide https://iamunnip.hashnode.dev/building-a-kubernetes-v129-cluster-using-kubeadm with the exception of Calico install.
  2. Windows node is a Windows Server 2019
  3. Follow guide https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/guides/guide-for-adding-windows-node.md
  4. Follow guide https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/guides/calico.md
  5. Attempt to start a Windows pod

Expected behavior Windows pod should start

Kubernetes:

  • Windows Server version: 2019
  • Kubernetes Version: 1.29.0
  • CNI: Calico 3.26.1

arp-mbender avatar Dec 29 '23 11:12 arp-mbender

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Mar 28 '24 12:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Apr 27 '24 12:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar May 27 '24 12:05 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar May 27 '24 12:05 k8s-ci-robot

  • On the control plane during the "kubeadm init" call, include the --control-plane-endpoint. So kubeadm init --control-plane-endpoint "[control plane ip]:6443 ~ This will ensure the server:port address is set properly in the windows node
  • If pods aren't getting created in the windows node, it may be due to permission issues. (There's a bunch of missing resources and verbs required by the calico node in the ClusterRole created for the windows node) https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/hostprocess/calico/kube-calico-rbac.yml.
  • Easy fix is to just create a new role binding and make the calico-node user a cluster admin:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: calico-node-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: calico-node
    namespace: kube-system
  • Actual fix for the cluster role is as follows:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: calico-node-windows
rules:
  - apiGroups: [""]
    resources:
      - namespaces
      - serviceaccounts
    verbs:
      - get
      - list
      - watch
  - apiGroups: [""]
    resources:
      - pods/status
    verbs:
      - patch
  - apiGroups: [""]
    resources:
      - pods
      - services
      - configmaps
    verbs:
      - get
      - list
      - watch
  - apiGroups: [""]
    resources:
      - endpoints
    verbs:
      - get
  - apiGroups: [""]
    resources:
      - nodes
      - nodes/status
    verbs:
      - get
      - list
      - update
      - watch
      - patch
  - apiGroups: ["extensions"]
    resources:
      - networkpolicies
    verbs:
      - get
      - list
      - watch
  - apiGroups: ["networking.k8s.io"]
    resources:
      - networkpolicies
    verbs:
      - watch
      - list
  - apiGroups: ["crd.projectcalico.org"]
    resources:
      - felixconfigurations
      - bgppeers
      - bgpconfigurations
      - ippools
      - globalnetworkpolicies
      - globalnetworksets
      - networkpolicies
      - clusterinformations
      - hostendpoints
      - ipreservations
      - ipamblocks
      - ipamconfigs
      - blockaffinities
      - networksets
      - ipamhandles
    verbs:
      - create
      - get
      - list
      - update
      - watch
  - apiGroups: ["discovery.k8s.io"]
    resources:
      - endpointslices
    verbs:
      - get
      - list
      - watch
  • Also note that here, I had to also change the cluster role name to "calico-node-windows" since it conflicts with the other "calico-node" cluster role and cluster role binding, so the cluster role binding also needs to be changed to:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: calico-node-windows
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: calico-node-windows
subjects:
  - kind: ServiceAccount
    name: calico-node
    namespace: kube-system
  • If the windows pods just can't communicate with other pods, one possible fix is to update https://raw.githubusercontent.com/projectcalico/calico/$CALICO_VERSION/manifests/custom-resources.yaml and change the ipPools/encapsulation from VXLANCrossSubnet to just VXLAN

peer-qvannatter avatar Jun 06 '24 19:06 peer-qvannatter