sig-windows-tools icon indicating copy to clipboard operation
sig-windows-tools copied to clipboard

Networking issues with calico on windows nodes - no internet connectivity

Open Breee opened this issue 4 months ago • 1 comments

Describe the bug

  • All Nodes can ping all other nodes --> Good
  • Linux pods work as expected --> Good
  • Pods started on windows nodes do not have network (not even DNS): --> BAD
$ k exec iis-demo-795f98f84d-lmrx5 -it --  nslookup google.com
DNS request timed out.
    timeout was 2 seconds.
Server:  UnKnown
Address:  10.42.0.10

DNS request timed out.
    timeout was 2 seconds.
DNS request timed out.
    timeout was 2 seconds.
DNS request timed out.
    timeout was 2 seconds.

k exec iis-demo-795f98f84d-lmrx5 -it -- ipconfig

Windows IP Configuration
Ethernet adapter vEthernet (8cc68cfad871ed1e55d5408e88553e50ecbf1e420975154524012f33b9ecf69c_Calico):
   Connection-specific DNS Suffix  . : default.svc.cluster.local
   Link-local IPv6 Address . . . . . : fe80::854b:7b85:43c9:44a7%34
   IPv4 Address. . . . . . . . . . . : 10.42.249.79
   Subnet Mask . . . . . . . . . . . : 255.255.255.192
   Default Gateway . . . . . . . . . : 10.42.249.65

To Reproduce

Cluster API manifest for kubeadm

---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: ${CI_COMMIT_REF_NAME}
    cni-windows: calico
    windows: enabled
  name: ${CI_COMMIT_REF_NAME}
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 10.42.128.0/17
    services:
      cidrBlocks:
      - 10.42.0.0/17
 [..]
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: ${K8S_CLUSTER}-windows-no-domain-join
  namespace: default
spec:
  template:
    spec:
      verbosity: 5
      files:
      - content: |-
          Add-MpPreference -ExclusionProcess C:/opt/cni/bin/calico.exe
          Add-MpPreference -ExclusionProcess C:/opt/cni/bin/calico-ipam.exe
        path: C:/defender-exclude-calico.ps1
        permissions: "0744"
      joinConfiguration:
        nodeRegistration:
          criSocket: npipe:////./pipe/containerd-containerd
          ignorePreflightErrors: ["IsPrivilegedUser"]
          kubeletExtraArgs:
            cloud-provider: external
            v: "2"
            windows-priorityclass: ABOVE_NORMAL_PRIORITY_CLASS
            hostname-override: '{{ ds.meta_data["local_hostname"] }}'
          name: '{{ ds.meta_data["local_hostname"] }}'
      preKubeadmCommands:
      - mklink /d c:\etc\kubernetes\ssl c:\etc\kubernetes\pki
      postKubeadmCommands:
      - nssm set kubelet start SERVICE_AUTO_START
      - powershell C:/defender-exclude-calico.ps1

calico installation values:

installation:
  enabled: true
  # Configures Calico networking.
  serviceCIDRs:
    - 10.42.0.0/17
  calicoNetwork:
    bgp: Disabled
    linuxDataplane: Iptables
    windowsDataplane: HNS
    ipPools:
    - name: default-ipv4-ippool
      blockSize: 26
      cidr: 10.42.128.0/17
      encapsulation: VXLAN
      natOutgoing: Enabled
      nodeSelector: all()

apiServer:
  enabled: true

script:

      export CALICO_VERSION="v3.28.1"
      kubectl apply --server-side --force-conflicts -f https://raw.githubusercontent.com/projectcalico/calico/${CALICO_VERSION}/manifests/operator-crds.yaml
      helm repo add projectcalico https://docs.tigera.io/calico/charts
      kubectl apply -f calico/namespace.yaml
      cat calico/endpoints.yaml  | \
        envsubst | \
        kubectl apply -f -
      cat calico/values.yaml  | \
      envsubst | \
      helm install --version ${CALICO_VERSION} --namespace tigera-operator  calico projectcalico/tigera-operator --values -
      sleep 30
      while ! kubectl get installation default; do
          echo "Waiting for installation default to exist..."
          sleep 10
      done
      while ! kubectl get ippool default-ipv4-ippool; do
          echo "Waiting for ippool default-ipv4-ippool to exist..."
          sleep 10
      done
      kubectl patch ippool default-ipv4-ippool --type='json' -p='[{"op": "replace", "path": "/spec/vxlanMode", "value": "Always"}]'
      while ! kubectl get ipamconfig default; do
          echo "Waiting for ipamconfig default to exist..."
          sleep 10
      done
      kubectl patch ipamconfig default --type merge --patch='{"spec": {"strictAffinity": true}}'
      curl -L https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/hostprocess/calico/kube-proxy/kube-proxy.yml | sed 's/KUBE_PROXY_VERSION/v1.31.1/g' | kubectl apply -f -

full rendered installation:

apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  annotations:
    meta.helm.sh/release-name: calico
    meta.helm.sh/release-namespace: tigera-operator
  creationTimestamp: "2024-10-09T11:56:28Z"
  finalizers:
  - operator.tigera.io/installation-controller
  - tigera.io/operator-cleanup
  - operator.tigera.io/apiserver-controller
  generation: 5
  labels:
    app.kubernetes.io/managed-by: Helm
  name: default
  resourceVersion: "786311"
  uid: 7dcb3345-8343-4fe8-8133-3c74e193515c
spec:
  calicoNetwork:
    bgp: Disabled
    hostPorts: Enabled
    ipPools:
    - allowedUses:
      - Workload
      - Tunnel
      blockSize: 26
      cidr: 10.42.128.0/17
      disableBGPExport: false
      encapsulation: VXLAN
      name: default-ipv4-ippool
      natOutgoing: Enabled
      nodeSelector: all()
    linuxDataplane: Iptables
    linuxPolicySetupTimeoutSeconds: 0
    multiInterfaceMode: None
    nodeAddressAutodetectionV4:
      kubernetes: NodeInternalIP
    windowsDataplane: HNS
  cni:
    ipam:
      type: Calico
    type: Calico
  controlPlaneReplicas: 2
  flexVolumePath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
  imagePullSecrets: []
  kubeletVolumePluginPath: /var/lib/kubelet
  kubernetesProvider: ""
  logging:
    cni:
      logFileMaxAgeDays: 30
      logFileMaxCount: 10
      logFileMaxSize: 100Mi
      logSeverity: Info
  nodeUpdateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
  nonPrivileged: Disabled
  serviceCIDRs:
  - 10.42.0.0/17
  variant: Calico
  windowsNodes:
    cniBinDir: /opt/cni/bin
    cniConfigDir: /etc/cni/net.d
    cniLogDir: /var/log/calico/cni
status:
  calicoVersion: v3.28.1
  computed:
    calicoNetwork:
      bgp: Disabled
      hostPorts: Enabled
      ipPools:
      - allowedUses:
        - Workload
        - Tunnel
        blockSize: 26
        cidr: 10.42.128.0/17
        disableBGPExport: false
        encapsulation: VXLAN
        name: default-ipv4-ippool
        natOutgoing: Enabled
        nodeSelector: all()
      linuxDataplane: Iptables
      linuxPolicySetupTimeoutSeconds: 0
      multiInterfaceMode: None
      nodeAddressAutodetectionV4:
        kubernetes: NodeInternalIP
      windowsDataplane: HNS
    cni:
      ipam:
        type: Calico
      type: Calico
    controlPlaneReplicas: 2
    flexVolumePath: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/
    kubeletVolumePluginPath: /var/lib/kubelet
    logging:
      cni:
        logFileMaxAgeDays: 30
        logFileMaxCount: 10
        logFileMaxSize: 100Mi
        logSeverity: Info
    nodeUpdateStrategy:
      rollingUpdate:
        maxUnavailable: 1
      type: RollingUpdate
    nonPrivileged: Disabled
    serviceCIDRs:
    - 10.42.0.0/17
    variant: Calico
    windowsNodes:
      cniBinDir: /opt/cni/bin
      cniConfigDir: /etc/cni/net.d
      cniLogDir: /var/log/calico/cni
  conditions:
  - lastTransitionTime: "2024-10-11T06:36:14Z"
    message: All Objects Available
    observedGeneration: 5
    reason: AllObjectsAvailable
    status: "False"
    type: Degraded
  - lastTransitionTime: "2024-10-11T06:36:14Z"
    message: All objects available
    observedGeneration: 5
    reason: AllObjectsAvailable
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-10-11T06:36:14Z"
    message: All Objects Available
    observedGeneration: 5
    reason: AllObjectsAvailable
    status: "False"
    type: Progressing
  mtu: 1450
  variant: Calico

IPAM

$ k get ipamconfig default -o yaml
apiVersion: crd.projectcalico.org/v1
kind: IPAMConfig
metadata:
  annotations:
    projectcalico.org/metadata: '{"creationTimestamp":null}'
  creationTimestamp: "2024-10-09T11:57:01Z"
  generation: 2
  name: default
  resourceVersion: "885"
  uid: d17c15b9-2059-4a2a-b8f0-87614d2570b3
spec:
  autoAllocateBlocks: true
  strictAffinity: true

pool

 k get ippools. default-ipv4-ippool  -o yaml
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  creationTimestamp: "2024-10-09T11:56:34Z"
  generation: 1
  labels:
    app.kubernetes.io/managed-by: tigera-operator
  name: default-ipv4-ippool
  resourceVersion: "1455"
  uid: fa602211-17ff-4a75-aca4-099d0983e81e
spec:
  allowedUses:
  - Workload
  - Tunnel
  blockSize: 26
  cidr: 10.42.128.0/17
  ipipMode: Never
  natOutgoing: true
  nodeSelector: all()
  vxlanMode: Always

Expected behavior Pod have internet / dns

Kubernetes (please complete the following information):

  • Windows Server version: Windows Server 2022 Standard 10.0.20348.2700
  • Kubernetes Version: 1.31.1
  • CNI: calico v3.28.1

Additional context

calico-windows-node: attached as log file caliconode.log

annotations:

$ k get nodes -o yaml | grep projectcalico.org
      projectcalico.org/IPv4Address: 10.13.18.42/24
      projectcalico.org/IPv4VXLANTunnelAddr: 10.42.183.128
      projectcalico.org/IPv4Address: 10.13.18.64/24
      projectcalico.org/IPv4VXLANTunnelAddr: 10.42.170.192
      projectcalico.org/IPv4Address: 10.13.18.9/24
      projectcalico.org/IPv4VXLANTunnelAddr: 10.42.254.64
      projectcalico.org/IPv4Address: 10.13.18.8/24
      projectcalico.org/IPv4VXLANTunnelAddr: 10.42.237.128
      projectcalico.org/IPv4Address: 10.12.12.157/16
      projectcalico.org/IPv4VXLANTunnelAddr: 10.42.249.65
      projectcalico.org/VXLANTunnelMACAddr: 00:15:5d:6c:b8:ef
      projectcalico.org/IPv4Address: 10.12.12.35/16
      projectcalico.org/IPv4VXLANTunnelAddr: 10.42.215.129
      projectcalico.org/VXLANTunnelMACAddr: 00:15:5d:16:e4:e5

nodes

 k get nodes -o wide
NAME                      STATUS   ROLES           AGE   VERSION   INTERNAL-IP    EXTERNAL-IP    OS-IMAGE                       KERNEL-VERSION      CONTAINER-RUNTIME
clippy-md-0-2kt4j-4wthh   Ready    <none>          43h   v1.31.1   10.13.18.42    10.13.18.42    Ubuntu 20.04.6 LTS             5.4.0-195-generic   containerd://1.7.22
clippy-md-0-2kt4j-9l6xw   Ready    <none>          43h   v1.31.1   10.13.18.64    10.13.18.64    Ubuntu 20.04.6 LTS             5.4.0-195-generic   containerd://1.7.22
clippy-md-0-2kt4j-ffmgj   Ready    <none>          43h   v1.31.1   10.13.18.9     10.13.18.9     Ubuntu 20.04.6 LTS             5.4.0-195-generic   containerd://1.7.22
clippy-nn78j              Ready    control-plane   43h   v1.31.1   10.13.18.8     10.13.18.8     Ubuntu 20.04.6 LTS             5.4.0-195-generic   containerd://1.7.22
cw2-ptmtr-nwbvs           Ready    <none>          25h   v1.31.1   10.12.12.157   10.12.12.157   Windows Server 2022 Standard   10.0.20348.2700     containerd://1.7.22
win-w85jm-ldgdk           Ready    <none>          63m   v1.31.1   10.12.12.35    10.12.12.35    Windows Server 2022 Standard   10.0.20348.2700     containerd://1.7.22
$ Get-HnsNetwork
ActivityId             : 9F7B4A29-2870-42B5-B73B-10FF10F5ADB5
AdditionalParams       :
CurrentEndpointCount   : 0
DNSServerCompartment   : 3
DrMacAddress           : 00-15-5D-6C-B8-EF
Extensions             : {@{Id=E7C3B2F0-F3C5-48DF-AF2B-10FED6D72E7A; IsEnabled=False;
                         Name=Microsoft Windows Filtering Platform},
                         @{Id=F74F241B-440F-4433-BB28-00F89EAD20D8; IsEnabled=True;
                         Name=Microsoft Azure VFP Switch Extension},
                         @{Id=430BDADD-BAB0-41AB-A369-94B67FA5BE0A; IsEnabled=True;
                         Name=Microsoft NDIS Capture}}
Flags                  : 0
Health                 : @{LastErrorCode=0; LastUpdateTime=133731058207475297}
ID                     : 7DA4BC16-24A3-4A60-92FB-EAF3196E1FA0
IPv6                   : False
LayeredOn              : 8FA2C2D1-16E0-4693-9DCB-10B8459164D1
MacPools               : {@{EndMacAddress=00-15-5D-C8-FF-FF;
                         StartMacAddress=00-15-5D-C8-F0-00}}
ManagementIP           : 10.12.12.157
MaxConcurrentEndpoints : 0
Name                   : External
Policies               : {}
State                  : 1
Subnets                : {@{AdditionalParams=; AddressPrefix=192.168.255.0/30; Flags=0;
                         GatewayAddress=192.168.255.1; Health=;
                         ID=CA943AB7-B803-4CF8-B107-85B8FEAF949C;
                         IpSubnets=System.Object[]; ObjectType=5;
                         Policies=System.Object[]; State=0}}
TotalEndpoints         : 0
Type                   : Overlay
Version                : 55834574851
Resources              : @{AdditionalParams=; AllocationOrder=1;
                         Allocators=System.Object[]; CompartmentOperationTime=0; Flags=0;
                         Health=; ID=9F7B4A29-2870-42B5-B73B-10FF10F5ADB5;
                         PortOperationTime=0; State=1; SwitchOperationTime=0;
                         VfpOperationTime=0; parentId=D26B2287-32EE-41BD-A150-EDA2DBB20A30}

ActivityId             : 8C92A66D-7817-40AE-AFE6-0BB5824D54D9
AdditionalParams       :
CurrentEndpointCount   : 0
DNSServerCompartment   : 4
DrMacAddress           : 00-15-5D-6C-B8-EF
Extensions             : {@{Id=E7C3B2F0-F3C5-48DF-AF2B-10FED6D72E7A; IsEnabled=False;
                         Name=Microsoft Windows Filtering Platform},
                         @{Id=F74F241B-440F-4433-BB28-00F89EAD20D8; IsEnabled=True;
                         Name=Microsoft Azure VFP Switch Extension},
                         @{Id=430BDADD-BAB0-41AB-A369-94B67FA5BE0A; IsEnabled=True;
                         Name=Microsoft NDIS Capture}}
Flags                  : 0
Health                 : @{LastErrorCode=0; LastUpdateTime=133731058414671503}
ID                     : D1DBE980-BE3E-4049-AA5D-93A4EBEF45B2
IPv6                   : False
LayeredOn              : 8FA2C2D1-16E0-4693-9DCB-10B8459164D1
MacPools               : {@{EndMacAddress=00-15-5D-55-1F-FF;
                         StartMacAddress=00-15-5D-55-10-00}}
ManagementIP           : 10.12.12.157
MaxConcurrentEndpoints : 1
Name                   : Calico
Policies               : {@{DestinationPrefix=10.42.183.128/26;
                         DistributedRouterMacAddress=66-ef-b3-b4-4c-c8; IsolationId=4096;
                         ProviderAddress=10.13.18.42; Type=RemoteSubnetRoute},
                         @{DestinationPrefix=10.42.237.128/26;
                         DistributedRouterMacAddress=66-52-57-21-f1-b0; IsolationId=4096;
                         ProviderAddress=10.13.18.8; Type=RemoteSubnetRoute},
                         @{DestinationPrefix=10.42.170.192/26;
                         DistributedRouterMacAddress=66-84-f6-b1-67-9c; IsolationId=4096;
                         ProviderAddress=10.13.18.64; Type=RemoteSubnetRoute},
                         @{DestinationPrefix=10.42.215.128/26;
                         DistributedRouterMacAddress=00-15-5d-16-e4-e5; IsolationId=4096;
                         ProviderAddress=10.12.12.35; Type=RemoteSubnetRoute}...}
State                  : 1
Subnets                : {@{AdditionalParams=; AddressPrefix=10.42.249.64/26; Flags=0;
                         GatewayAddress=10.42.249.65; Health=;
                         ID=019DD141-81FA-4F45-B567-356817EA46DC;
                         IpSubnets=System.Object[]; ObjectType=5;
                         Policies=System.Object[]; State=0}}
TotalEndpoints         : 2
Type                   : Overlay
Version                : 55834574851
Resources              : @{AdditionalParams=; AllocationOrder=1;
                         Allocators=System.Object[]; CompartmentOperationTime=0; Flags=0;
                         Health=; ID=8C92A66D-7817-40AE-AFE6-0BB5824D54D9;
                         PortOperationTime=0; State=1; SwitchOperationTime=0;
                         VfpOperationTime=0; parentId=D26B2287-32EE-41BD-A150-EDA2DBB20A30}

Breee avatar Oct 11 '24 08:10 Breee