sig-windows-tools icon indicating copy to clipboard operation
sig-windows-tools copied to clipboard

no connection to service net in windows pod when useing flannel vxlan (overlay) network

Open uli-fischer opened this issue 1 year ago • 14 comments

Describe the bug I've problem when installing a Windows node.

  1. Is there a typo in the documentation? in the Guide for flannel guides/flannel.md was a refferenz to install flannel for WIndows:
controlPlaneEndpoint=$(kubectl get configmap -n kube-system kube-proxy -o jsonpath="{.data['kubeconfig\.conf']}" | grep server: | sed 's/.*\:\/\///g')
kubernetesServiceHost=$(echo $controlPlaneEndpoint | cut -d ":" -f 1)
kubernetesServicePort=$(echo $controlPlaneEndpoint | cut -d ":" -f 2)
curl -L https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/hostprocess/flannel/flanneld/flannel-overlay.yml | sed 's/FLANNEL_VERSION/v0.21.5/g' | sed "s/KUBERNETES_SERVICE_HOST_VALUE/$kubernetesServiceHost/g" | sed "s/KUBERNETES_SERVICE_PORT_VALUE/$kubernetesServicePort/g" | kubectl apply -f -

It reffers to Version v0.21.5 but the newes version i could found is Version v0.14.0-hostprocess i changed it to mik4sa/flannel:v0.21.5 but not clear if this is correct. with this change the Proxy and Host Process is up. but unfortunately is fail in the next error no connection to Service LAN

  1. when i Install the Node and start a pod (see config) i could ping alls Networks expected the Service LAN. so the DNS is not working.

To Reproduce Do on a running Cluster:

$curl -L https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/hostprocess/flannel/flanneld/flannel-overlay.yml | sed 's/sigwindowstools\/flannel:FLANNEL_VERSION/mik4sa\/flannel:v0.21.5/g' | kubectl apply -f -
$curl -L https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/hostprocess/flannel/kube-proxy/kube-proxy.yml | sed 's/KUBE_PROXY_VERSION/v1.27.3/g' | kubectl apply -f -
$kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/hostprocess/flannel/flanneld/kube-flannel-rbac.yml
$ kubectl get pods -n kube-flannel
NAME                                  READY   STATUS    RESTARTS        AGE
kube-flannel-ds-8vvv2                 1/1     Running   1 (14d ago)     14d
kube-flannel-ds-94v42                 1/1     Running   1 (4d16h ago)   14d
kube-flannel-ds-hhzhk                 1/1     Running   0               14d
kube-flannel-ds-windows-amd64-4wkmb   1/1     Running   0               23h
 $ kubectl describe pod kube-flannel-ds-windows-amd64-4wkmb -n kube-flannel
Name:             kube-flannel-ds-windows-amd64-4wkmb
Namespace:        kube-flannel
Priority:         0
Service Account:  flannel
Node:             k8t-win-node-1/10.10.13.204
Start Time:       Fri, 04 Aug 2023 18:37:22 +0200
Labels:           app=flannel
                  controller-revision-hash=64d67796cc
                  pod-template-generation=8
                  tier=node
Annotations:      <none>
Status:           Running
IP:               10.10.13.204
IPs:
  IP:           10.10.13.204
Controlled By:  DaemonSet/kube-flannel-ds-windows-amd64
Containers:
  kube-flannel:
    Container ID:   containerd://7b86da67e60a8c0d41b0ecdb6523aa84b542dd85f3c7345ec89ab288e44ca331
    Image:          mik4sa/flannel:v0.21.5-hostprocess
    Image ID:       docker.io/mik4sa/flannel@sha256:71b187a72810d9da27d304bbe8557487c69e95c60942f43940074e0d8caecf96
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Fri, 04 Aug 2023 18:37:24 +0200
    Ready:          True
    Restart Count:  0
    Environment:
      CNI_BIN_PATH:             C:\\opt\\cni\\bin
      CNI_CONFIG_PATH:          C:\\etc\\cni\\net.d
      SERVICE_SUBNET:           10.96.0.0/12
      KUBERNETES_SERVICE_HOST:  10.10.13.201
      KUBERNETES_SERVICE_PORT:  6443
      POD_NAME:                 kube-flannel-ds-windows-amd64-4wkmb (v1:metadata.name)
      POD_NAMESPACE:            kube-flannel (v1:metadata.namespace)
    Mounts:
      /mounts/kube-flannel-windows/ from flannel-windows-cfg (rw)
      /mounts/kube-flannel/ from flannel-cfg (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sdkzs (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  flannel-cfg:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-flannel-cfg
    Optional:  false
  flannel-windows-cfg:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-flannel-windows-cfg
    Optional:  false
  kube-api-access-sdkzs:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 :NoSchedule op=Exists
                             :NoExecute op=Exists
                             CriticalAddonsOnly op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:                      <none>

Uses this Testpod Config:

$ cat winTest.yaml
# windows-pod-with-pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test-claim
spec:
  storageClassName: synology-iscsi-win
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1G
---
apiVersion: v1
kind: Pod
metadata:
  name: my-windows-pod
spec:
  containers:
  - name: windows-server-container
    image: mcr.microsoft.com/windows/servercore:ltsc2019
    command:
    - powershell.exe
    args:
    - "-NoLogo"
    - "-Command"
    - "while ($true) { Write-Host 'Hello from Windows Server 2019'; Start-Sleep -Seconds 5 }"
    volumeMounts:
    - name: my-pvc-volume
      mountPath: "D:"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/os
            operator: In
            values:
            - windows
  volumes:
  - name: my-pvc-volume
    persistentVolumeClaim:
      claimName: test-claim

and make this Tests

$ kubectl get pods  -n kube-system -o=wide
NAME                                   READY   STATUS    RESTARTS          AGE   IP             NODE             NOMINATED NODE   READINESS GATES
coredns-f47c568f5-l4twx                1/1     Running   0                 7d    10.244.0.20    k8t-master-1     <none>           <none>
coredns-f47c568f5-wzpcv                1/1     Running   0                 7d    10.244.1.31    k8t-node-1       <none>           <none>
etcd-k8t-master-1                      1/1     Running   2 (14d ago)       33d   10.10.13.201   k8t-master-1     <none>           <none>
kube-apiserver-k8t-master-1            1/1     Running   368               14d   10.10.13.201   k8t-master-1     <none>           <none>
kube-controller-manager-k8t-master-1   1/1     Running   3 (14d ago)       33d   10.10.13.201   k8t-master-1     <none>           <none>
kube-proxy-4zzsq                       1/1     Running   1 (14d ago)       33d   10.10.13.202   k8t-node-1       <none>           <none>
kube-proxy-7lkjg                       1/1     Running   2 (14d ago)       33d   10.10.13.201   k8t-master-1     <none>           <none>
kube-proxy-8djmb                       1/1     Running   2 (4d16h ago)     33d   10.10.13.203   k8t-node-2       <none>           <none>
kube-proxy-windows-9rqgv               1/1     Running   5 (13d ago)       14d   10.10.13.204   k8t-win-node-1   <none>           <none>
kube-scheduler-k8t-master-1            1/1     Running   3 (14d ago)       33d   10.10.13.201   k8t-master-1     <none>           <none>
snapshot-controller-9695c8478-4xbdj    1/1     Running   438 (4d16h ago)   31d   10.244.2.33    k8t-node-2       <none>           <none>
snapshot-controller-9695c8478-cn6lt    1/1     Running   1 (14d ago)       31d   10.244.1.18    k8t-node-1       <none>           <none>
$ kubectl get pods  -n kube-flannel -o=wide
NAME                                  READY   STATUS    RESTARTS        AGE   IP             NODE             NOMINATED NODE   READINESS GATES
kube-flannel-ds-8vvv2                 1/1     Running   1 (14d ago)     14d   10.10.13.201   k8t-master-1     <none>           <none>
kube-flannel-ds-94v42                 1/1     Running   1 (4d16h ago)   14d   10.10.13.203   k8t-node-2       <none>           <none>
kube-flannel-ds-hhzhk                 1/1     Running   0               14d   10.10.13.202   k8t-node-1       <none>           <none>
kube-flannel-ds-windows-amd64-4wkmb   1/1     Running   0               23h   10.10.13.204   k8t-win-node-1   <none>           <none>
$kubectl exec -it my-windows-pod -- powershell
PS C:\> ipconfig /all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : my-windows-pod
   Primary Dns Suffix  . . . . . . . :
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : default.svc.cluster.local
                                       svc.cluster.local
                                       cluster.local

Ethernet adapter vEthernet (5195b5da3e3bb0b8f92bcbdfce384d2c2a7eac5e55220691f91bdb64dd671f1a_flannel.4096):

   Connection-specific DNS Suffix  . : default.svc.cluster.local
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Adapter #4
   Physical Address. . . . . . . . . : 00-15-5D-34-6F-58
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::3c2d:f9da:9aec:d253%29(Preferred)
   IPv4 Address. . . . . . . . . . . : 10.244.4.28(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 10.244.4.1
   DNS Servers . . . . . . . . . . . : 10.96.0.10
   NetBIOS over Tcpip. . . . . . . . : Disabled
   Connection-specific DNS Suffix Search List :
                                       default.svc.cluster.local
                                       svc.cluster.local
                                       cluster.local
PS C:\> nslookup www.google.de
DNS request timed out.
    timeout was 2 seconds.
Server:  UnKnown
Address:  10.96.0.10

DNS request timed out.
    timeout was 2 seconds.
PS C:\> nslookup www.google.de 10.10.13.1    # 10.10.13.1 is my external dns
Server:  UnKnown
Address:  10.10.13.1

Non-authoritative answer:
Name:    www.google.de
Addresses:  2a00:1450:4016:80c::2003
          172.217.16.163
PS C:\> nslookup www.google.de 10.244.0.20
Server:  10-244-0-20.kube-dns.kube-system.svc.cluster.local
Address:  10.244.0.20

Non-authoritative answer:
Name:    www.google.de
Addresses:  2a00:1450:4016:80c::2003
          172.217.16.163
PS C:\>  Test-NetConnection -ComputerName 10.96.0.10 -Port 53 -InformationLevel Detailed
WARNING: TCP connect to (10.96.0.10 : 53) failed
WARNING: Ping to 10.96.0.10 failed with status: TimedOut


ComputerName            : 10.96.0.10
RemoteAddress           : 10.96.0.10
RemotePort              : 53
NameResolutionResults   : 10.96.0.10
MatchingIPsecRules      :
NetworkIsolationContext :
InterfaceAlias          : vEthernet (5195b5da3e3bb0b8f92bcbdfce384d2c2a7eac5e55220691f91bdb64dd671f1a_flannel.4096)
SourceAddress           : 10.244.4.28
NetRoute (NextHop)      : 10.244.4.1
PingSucceeded           : False
PingReplyDetails (RTT)  : 0 ms
TcpTestSucceeded        : False
PS C:\>  Test-NetConnection -ComputerName 10.10.13.1 -Port 53 -InformationLevel Detailed


ComputerName            : 10.10.13.1
RemoteAddress           : 10.10.13.1
RemotePort              : 53
NameResolutionResults   : 10.10.13.1
MatchingIPsecRules      :
NetworkIsolationContext :
InterfaceAlias          : vEthernet (5195b5da3e3bb0b8f92bcbdfce384d2c2a7eac5e55220691f91bdb64dd671f1a_flannel.4096)
SourceAddress           : 10.244.4.28
NetRoute (NextHop)      : 10.244.4.1
TcpTestSucceeded        : True
PS C:\>  Test-NetConnection -ComputerName 10.244.0.20 -Port 53 -InformationLevel Detailed


ComputerName            : 10.244.0.20
RemoteAddress           : 10.244.0.20
RemotePort              : 53
NameResolutionResults   : 10.244.0.20
MatchingIPsecRules      :
NetworkIsolationContext :
InterfaceAlias          : vEthernet (5195b5da3e3bb0b8f92bcbdfce384d2c2a7eac5e55220691f91bdb64dd671f1a_flannel.4096)
SourceAddress           : 10.244.4.28
NetRoute (NextHop)      : 10.244.4.1
TcpTestSucceeded        : True

Expected behavior PS C:> nslookup www.google.de should work.

Kubernetes (please complete the following information):

  • Windows Server version: inside POD:
PS C:\> Get-ComputerInfo | Select-Object WindowsVersion

WindowsVersion
--------------
1809

Outside POD:

PS C:\> Get-ComputerInfo | Select-Object WindowsVersion

WindowsVersion
--------------
1809
  • Kubernetes Version:
$ kubectl version --output=yaml
clientVersion:
  buildDate: "2023-06-14T09:53:42Z"
  compiler: gc
  gitCommit: 25b4e43193bcda6c7328a6d147b1fb73a33f1598
  gitTreeState: clean
  gitVersion: v1.27.3
  goVersion: go1.20.5
  major: "1"
  minor: "27"
  platform: linux/arm64
kustomizeVersion: v5.0.1
serverVersion:
  buildDate: "2023-06-14T09:47:40Z"
  compiler: gc
  gitCommit: 25b4e43193bcda6c7328a6d147b1fb73a33f1598
  gitTreeState: clean
  gitVersion: v1.27.3
  goVersion: go1.20.5
  major: "1"
  minor: "27"
  platform: linux/arm64
  • CNI:
$ kubectl get daemonsets -n kube-flannel -o=wide
NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE   CONTAINERS     IMAGES                               SELECTOR
kube-flannel-ds                 3         3         3       3            3           <none>          14d   kube-flannel   docker.io/flannel/flannel:v0.22.0    app=flannel
kube-flannel-ds-windows-amd64   1         1         1       1            1           <none>          14d   kube-flannel   mik4sa/flannel:v0.21.5-hostprocess   app=flannel

Additional context I'll try to test it with uweerikmartin/flannel but have no success get an error:

 $ kubectl logs kube-flannel-ds-windows-amd64-qtgvk -n kube-flannel
Copying SDN CNI binaries to host


    Directory: C:\opt\cni


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----         7/2/2023   3:25 PM                bin
copy flannel config


    Directory: C:\etc


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----         7/2/2023   4:38 PM                kube-flannel


    Directory: C:\etc\kube-flannel


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----         8/5/2023   9:16 AM            109 net-conf.json


    Directory: C:\hpc\mounts\kube-flannel


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----         8/5/2023   9:16 AM                ..2023_08_05_16_16_31.325047069
d----l         8/5/2023   9:16 AM                ..data
-a---l         8/5/2023   9:16 AM              0 cni-conf.json
-a---l         8/5/2023   9:16 AM              0 net-conf.json
update cni config
get-content : Cannot find path 'C:\hpc\mounts\kubeadm-config\ClusterConfiguration' because it does not exist.
At C:\hpc\flannel\start.ps1:18 char:18
+ ... iceSubnet = get-content $env:CONTAINER_SANDBOX_MOUNT_POINT/mounts/kub ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (C:\hpc\mounts\k...erConfiguration:String) [Get-Content], ItemNotFoundEx
   ception
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand

Check if i could solve the error.

uli-fischer avatar Aug 05 '23 07:08 uli-fischer

tested with oguertlertt/flannel:v0.22.0. Same Problem. 👎 no clue what it's wrong.

$ kubectl get daemonsets -n kube-flannel -o=wide
NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE   CONTAINERS     IMAGES                                    SELECTOR
kube-flannel-ds                 3         3         3       3            3           <none>          14d   kube-flannel   docker.io/flannel/flannel:v0.22.0         app=flannel
kube-flannel-ds-windows-amd64   1         1         1       1            1           <none>          14d   kube-flannel   oguertlertt/flannel:v0.22.0-hostprocess   app=flannel

get this errors in the Logfile od the Hostprocess:

$ kubectl logs kube-flannel-ds-windows-amd64-bf4bc -n kube-flannel -f
Copying SDN CNI binaries to host


    Directory: C:\opt\cni


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----         7/2/2023   3:25 PM                bin
copy flannel config


    Directory: C:\etc


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----         7/2/2023   4:38 PM                kube-flannel


    Directory: C:\etc\kube-flannel


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----         8/5/2023   9:39 AM            109 net-conf.json


    Directory: C:\hpc\mounts\kube-flannel


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----         8/5/2023   9:45 AM                ..2023_08_05_16_45_21.1802066358
d----l         8/5/2023   9:45 AM                ..data
-a---l         8/5/2023   9:45 AM              0 cni-conf.json
-a---l         8/5/2023   9:45 AM              0 net-conf.json
update cni config


    Directory: C:\etc\cni


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----         7/2/2023   4:38 PM                net.d
add route
The route addition failed: The object already exists.

envs
kube-flannel-ds-windows-amd64-bf4bc
kube-flannel
Starting flannel
I0805 09:45:24.184051  246316 main.go:212] CLI flags config: {etcdEndpoints:http://127.0.0.1:4001,http://127.0.0.1:2379 etcdPrefix:/coreos.com/network etcdKeyfile: etcdCertfile: etcdCAFile: etcdUsername: etcdPassword: version:false kubeSubnetMgr:true kubeApiUrl: kubeAnnotationPrefix:flannel.alpha.coreos.com kubeConfigFile: iface:[10.10.13.204] ifaceRegex:[] ipMasq:false ifaceCanReach: subnetFile:/run/flannel/subnet.env publicIP: publicIPv6: subnetLeaseRenewMargin:60 healthzIP:0.0.0.0 healthzPort:0 iptablesResyncSeconds:5 iptablesForwardRules:true netConfPath:/etc/kube-flannel/net-conf.json setNodeNetworkUnavailable:true useMultiClusterCidr:false}
W0805 09:45:24.186802  246316 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0805 09:45:24.254216  246316 kube.go:486] Starting kube subnet manager
I0805 09:45:24.254216  246316 kube.go:145] Waiting 10m0s for node controller to sync
I0805 09:45:24.267665  246316 kube.go:507] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.244.0.0/24]
I0805 09:45:24.267665  246316 kube.go:507] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.244.1.0/24]
I0805 09:45:24.267665  246316 kube.go:507] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.244.2.0/24]
I0805 09:45:24.267665  246316 kube.go:507] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.244.4.0/24]
I0805 09:45:25.255069  246316 kube.go:152] Node controller sync successful
I0805 09:45:25.255069  246316 main.go:232] Created subnet manager: Kubernetes Subnet Manager - k8t-win-node-1
I0805 09:45:25.255069  246316 main.go:235] Installing signal handlers
I0805 09:45:25.255713  246316 main.go:543] Found network config - Backend type: vxlan
I0805 09:45:25.256988  246316 match.go:73] Searching for interface using 10.10.13.204
I0805 09:45:25.272671  246316 match.go:259] Using interface with name vEthernet (Ethernet) and address 10.10.13.204
I0805 09:45:25.275260  246316 match.go:281] Defaulting external address to interface address (10.10.13.204)
I0805 09:45:25.275327  246316 vxlan_windows.go:126] VXLAN config: Name=flannel.4096 MacPrefix=0E-2A VNI=4096 Port=4789 GBP=false DirectRouting=false
time="2023-08-05T09:45:25-07:00" level=info msg="HCN feature check" supportedFeatures="{Acl:{AclAddressLists:true AclNoHostRulePriority:true AclPortRanges:true AclRuleId:true} Api:{V1:true V2:true} RemoteSubnet:true HostRoute:true DSR:true Slash32EndpointPrefixes:true AclSupportForProtocol252:false SessionAffinity:false IPv6DualStack:false SetPolicy:false VxlanPort:false L4Proxy:true L4WfpProxy:false TierAcl:false NetworkACL:false NestedIpSet:false}" version="{Major:9 Minor:5}"
I0805 09:45:25.354942  246316 kube.go:507] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.244.4.0/24]
I0805 09:45:25.354994  246316 device_windows.go:103] Found existing HostComputeNetwork flannel.4096
I0805 09:45:25.381913  246316 main.go:408] Changing default FORWARD chain policy to ACCEPT
I0805 09:45:25.383161  246316 kube.go:507] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.244.4.0/24]
I0805 09:45:25.383739  246316 main.go:436] Wrote subnet file to /run/flannel/subnet.env
I0805 09:45:25.383739  246316 main.go:440] Running backend.
I0805 09:45:25.383739  246316 vxlan_network_windows.go:63] Watching for new subnet leases
I0805 09:45:25.383739  246316 subnet.go:159] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xaf40000, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xa0a0dc9, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x34, 0x30, 0x39, 0x36, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x38, 0x32, 0x3a, 0x62, 0x66, 0x3a, 0x66, 0x32, 0x3a, 0x33, 0x65, 0x3a, 0x30, 0x65, 0x3a, 0x62, 0x33, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0805 09:45:25.383739  246316 subnet.go:159] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xaf40100, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xa0a0dca, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x34, 0x30, 0x39, 0x36, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x64, 0x32, 0x3a, 0x30, 0x33, 0x3a, 0x35, 0x62, 0x3a, 0x34, 0x32, 0x3a, 0x32, 0x30, 0x3a, 0x33, 0x39, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0805 09:45:25.388437  246316 subnet.go:159] Batch elem [0] is { lease.Event{Type:0, Lease:lease.Lease{EnableIPv4:true, EnableIPv6:false, Subnet:ip.IP4Net{IP:0xaf40200, PrefixLen:0x18}, IPv6Subnet:ip.IP6Net{IP:(*ip.IP6)(nil), PrefixLen:0x0}, Attrs:lease.LeaseAttrs{PublicIP:0xa0a0dcb, PublicIPv6:(*ip.IP6)(nil), BackendType:"vxlan", BackendData:json.RawMessage{0x7b, 0x22, 0x56, 0x4e, 0x49, 0x22, 0x3a, 0x34, 0x30, 0x39, 0x36, 0x2c, 0x22, 0x56, 0x74, 0x65, 0x70, 0x4d, 0x41, 0x43, 0x22, 0x3a, 0x22, 0x38, 0x32, 0x3a, 0x61, 0x33, 0x3a, 0x35, 0x39, 0x3a, 0x36, 0x63, 0x3a, 0x63, 0x36, 0x3a, 0x62, 0x61, 0x22, 0x7d}, BackendV6Data:json.RawMessage(nil)}, Expiration:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Asof:0}} }
I0805 09:45:25.426776  246316 main.go:461] Waiting for all goroutines to exit
....

uli-fischer avatar Aug 05 '23 07:08 uli-fischer

I have no time this weekend to assist you. I haven't read your description deeply tbh, but try to:

  1. build your own images for flannel and kube-proxy (I'm using v0.21.5 for flannel currently)
  2. Exactly follow the guide step by step but use your own images
  3. Then everything should work

Note: The RBAC file you are executing is no longer needed and should be deleted from the cluster when upgrading

Mik4sa avatar Aug 05 '23 08:08 Mik4sa

Okay i'll check this but unfortunately i a very newbee in Kubernetes/docker so build a Windows Image ist difficult :-) but i'll try this and come back to you

Note: The RBAC file you are executing is no longer needed and should be deleted from the cluster when upgrading i've seen this and check if the RBAC is the correct one.

do you know why there is no newer version in sigwindowstools? and if this is en error in the documentation

uli-fischer avatar Aug 05 '23 09:08 uli-fischer

@uli-fischer I built this recently: docker.io/syck0/flannel:v0.21.5-hostprocess. It works for me. Try it out

iankingori avatar Aug 14 '23 15:08 iankingori

use your own images or use [Mik4sa] v0.21.5 for flannel currently Please see #336

FangKee avatar Aug 15 '23 07:08 FangKee

OK I've tested it with all versions mentioned. No changes here. Upon further investigation, I found this error in the Windows kube proxy. I think that could be the error, but have no idea what's wrong.

I0822 07:12:51.168375   23324 config.go:133] "Calling handler.OnEndpointSliceUpdate"
I0822 07:13:01.157594   23324 config.go:133] "Calling handler.OnEndpointSliceUpdate"
I0822 07:13:11.162851   23324 config.go:133] "Calling handler.OnEndpointSliceUpdate"
I0822 07:13:14.464328   23324 hns.go:135] "Queried endpoints from network" network="flannel.4096"
I0822 07:13:14.464441   23324 hns.go:136] "Queried endpoints details" network="flannel.4096" endpointInfos=map[10.244.7.3:10.244.7.3:0 8f57c1ba-d61d-4a9c-9a92-5dadf07250dc:10.244.7.3:0]
I0822 07:13:14.464441   23324 hns.go:306] "Queried load balancers" count=0
E0822 07:13:14.477518   23324 proxier.go:1236] "Source Vip endpoint creation failed" err="hcnCreateEndpoint failed in Win32: IP address is either invalid or not part of any configured subnet(s). (0x803b001e) {\"Success\":false,\"Error\":\"IP address is either invalid or not part of any configured subnet(s). \",\"ErrorCode\":2151350302}"
I0822 07:13:14.477693   23324 proxier.go:1177] "Syncing proxy rules complete" elapsed="18.6334ms"
I0822 07:13:14.477693   23324 bounded_frequency_runner.go:296] sync-runner: ran, next possible in 1s, periodic in 30s

uli-fischer avatar Aug 22 '23 05:08 uli-fischer

Hi Pexeus Sorry ive not find an solution for this up to now. My next step ist to buld my own Images and try it once mor. biut no time up to now. if you find a solution pleas let me Know

uli-fischer avatar Dec 17 '23 08:12 uli-fischer

How did you guys initialized your cluster with kubeadm? Do you still have the exact command?

Mik4sa avatar Dec 17 '23 09:12 Mik4sa

Hi

as dokumentet i've don this sudo kubeadm init --pod-network-cidr=10.244.0.0/16 on the debian Master node

uli-fischer avatar Dec 17 '23 12:12 uli-fischer

Hmm this is actually the same command I used

Mik4sa avatar Dec 17 '23 13:12 Mik4sa

troubleshooting this issue as well... trying out a bunch of new stuff... considering refactoring my setup to host-gw...

  • k8s v1.28.2
  • windows server 2022
  • flannel v0.24.0

as for this error in kube proxy :

E0822 07:13:14.477518   23324 proxier.go:1236] "Source Vip endpoint creation failed" err="hcnCreateEndpoint failed in Win32: IP address is either invalid or not part of any configured subnet(s). (0x803b001e) {\"Success\":false,\"Error\":\"IP address is either invalid or not part of any configured subnet(s). \",\"ErrorCode\":2151350302}"

check the kube-proxy start script. did you unjoin and rejoin your windows worker node to the cluster ? flannel probably decided to pick a new 10.244.X.0/24 subnet for your node. the logic in the script checks for an existing file:

https://github.com/kubernetes-sigs/sig-windows-tools/blob/master/hostprocess/flannel/kube-proxy/start.ps1#L9-L10

try deleting contents in C:\sourcevip and restart windows kube-proxy

Zombro avatar Dec 19 '23 23:12 Zombro

as for the reported issue, the following stands out to me. inside of the test my-windows-pod, the vEthernet adapter looks configured properly...

PS C:\> ipconfig /all

Windows IP Configuration

   Host Name . . . . . . . . . . . . : my-windows-pod
   Primary Dns Suffix  . . . . . . . : 
   Node Type . . . . . . . . . . . . : Hybrid
   IP Routing Enabled. . . . . . . . : No
   WINS Proxy Enabled. . . . . . . . : No
   DNS Suffix Search List. . . . . . : development.svc.cluster.local
                                       svc.cluster.local
                                       cluster.local

Ethernet adapter vEthernet (d804a0f1ccc4bceb0754f85022a8a16fb9db520b948689f7a4d9ba4b26c44082_flannel.4096):

   Connection-specific DNS Suffix  . : development.svc.cluster.local
   Description . . . . . . . . . . . : Hyper-V Virtual Ethernet Container Adapter #4
   Physical Address. . . . . . . . . : 00-15-5D-CD-19-B6
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::be82:2ea:9ff1:e51f%53(Preferred) 
   IPv4 Address. . . . . . . . . . . : 10.244.11.7(Preferred) 
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 10.244.11.1
   DNS Servers . . . . . . . . . . . : 10.96.0.10
   NetBIOS over Tcpip. . . . . . . . : Disabled
   Connection-specific DNS Suffix Search List :
                                       development.svc.cluster.local
                                       svc.cluster.local
                                       cluster.local

... but the routes are screwed up. i expect to see something for 10.244.0.0/16 at least in if53

PS C:\> Get-NetRoute

ifIndex DestinationPrefix                              NextHop                                  RouteMetric ifMetric PolicyStore
------- -----------------                              -------                                  ----------- -------- -----------
53      255.255.255.255/32                             0.0.0.0                                          256 25       ActiveStore
52      255.255.255.255/32                             0.0.0.0                                          256 75       ActiveStore
53      224.0.0.0/4                                    0.0.0.0                                          256 25       ActiveStore
52      224.0.0.0/4                                    0.0.0.0                                          256 75       ActiveStore
52      127.255.255.255/32                             0.0.0.0                                          256 75       ActiveStore
52      127.0.0.1/32                                   0.0.0.0                                          256 75       ActiveStore
52      127.0.0.0/8                                    0.0.0.0                                          256 75       ActiveStore
53      10.244.11.255/32                               0.0.0.0                                          256 25       ActiveStore
53      10.244.11.7/32                                 0.0.0.0                                          256 25       ActiveStore
53      10.244.11.0/24                                 0.0.0.0                                          256 25       ActiveStore
53      0.0.0.0/0                                      10.244.11.1                                      256 25       ActiveStore

everything linux is working fine.

my theory is

  • some of the internal windows flannel machinery isn't creating routes properly or logging it
  • my CNI configuration is wrong, although i started with the example in this repository and haven't deviated much

Zombro avatar Dec 20 '23 00:12 Zombro

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Mar 19 '24 00:03 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Apr 18 '24 01:04 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar May 18 '24 01:05 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar May 18 '24 01:05 k8s-ci-robot