k3s icon indicating copy to clipboard operation
k3s copied to clipboard

Service with externalTrafficPolicy: Local in a Dual Stack cluster issues warnings

Open barsa-net opened this issue 3 years ago • 2 comments

Environmental Info: K3s Version: v1.24.4+k3s1 (c3f830e9)

Node(s) CPU architecture, OS, and Version: x86_64 Ubuntu 22.04.1 LTS (Linux Kernel 5.15.0-48-generic)

Cluster Configuration: 1 server

Describe the bug: Services with externalTrafficPolicy: Local in a Dual Stack (IPv4/IPv6) cluster results in HealthCheck NodePort bind issuing warning in events.

Expected behavior: No errors or warnings by suspected multiple socket bind

Actual behavior: There are errors logged suggesting multiple socket bind

Sep 21 18:45:32 ganymede k3s[813]: E0921 18:45:32.496643     813 service_health.go:141] "Failed to start healthcheck" err="listen tcp :30458: bind: address already in use" node="<node-name>" service="kube-system/traefik" port=30458

Additional context / logs:

> kubectl get events
NAMESPACE     LAST SEEN   TYPE      REASON                            OBJECT                      MESSAGE
kube-system   7m          Warning   FailedToStartServiceHealthcheck   service/traefik             node <node-name> failed to start healthcheck "kube-system/traefik" on port 30458: listen tcp :30458: bind: address already in use

> kubectl describe service traefik
Name:                     traefik
Namespace:                kube-system
Labels:                   app.kubernetes.io/instance=traefik
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=traefik
                          helm.sh/chart=traefik-10.19.300
Annotations:              meta.helm.sh/release-name: traefik
                          meta.helm.sh/release-namespace: kube-system
Selector:                 app.kubernetes.io/instance=traefik,app.kubernetes.io/name=traefik
Type:                     LoadBalancer
IP Family Policy:         PreferDualStack
IP Families:              IPv4,IPv6
IP:                       10.43.210.123
IPs:                      10.43.210.123,2001:cafe:42:1::18ea
LoadBalancer Ingress:     <external IPv4>, <external IPv6>
Port:                     web  80/TCP
TargetPort:               web/TCP
NodePort:                 web  30754/TCP
Endpoints:                10.42.1.17:8000
Port:                     websecure  443/TCP
TargetPort:               websecure/TCP
NodePort:                 websecure  31608/TCP
Endpoints:                10.42.1.17:8443
Session Affinity:         None
External Traffic Policy:  Local
HealthCheck NodePort:     30458
Events:                   <none>

> ss -nlpt
State             Recv-Q            Send-Q                       Local Address:Port                        Peer Address:Port            Process
LISTEN            0                 4096                                     *:30458                                  *:*                users:(("k3s-server",pid=813,fd=322))

Steps To Reproduce:

  • Installed k3s with the following /etc/rancher/k3s/config.yaml
node-name: <node-name>
node-ip: <internal IPv4, ULA IPv6>
node-external-ip: <external IPv4, external IPv6>
cluster-cidr: 10.42.0.0/16,2001:cafe:42:0::/56
service-cidr: 10.43.0.0/16,2001:cafe:42:1::/112
kubelet-arg:
  - "kube-reserved=cpu=100m,memory=256Mi,ephemeral-storage=1Gi"
  - "system-reserved=cpu=100m,memory=256Mi,ephemeral-storage=5Gi"
resolv-conf: /etc/resolv-kubelet.conf
  • Applied the following manifest:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: traefik
  namespace: kube-system
spec:
  valuesContent: |-
    deployment:
      kind: DaemonSet
    image:
      name: traefik
      tag: 2.8.3
    ingressRoute:
      dashboard:
        enabled: false
    service:
      spec:
        externalTrafficPolicy: Local

barsa-net avatar Sep 21 '22 19:09 barsa-net

Can you show the output of kubectl get services -A -o wide and kubectl get endpoints -A please?

manuelbuil avatar Sep 22 '22 17:09 manuelbuil

> kubectl get services -A -o wide
NAMESPACE     NAME             TYPE           CLUSTER-IP      EXTERNAL-IP                            PORT(S)                      AGE   SELECTOR
default       kubernetes       ClusterIP      10.43.0.1       <none>                                 443/TCP                      21h   <none>
kube-system   kube-dns         ClusterIP      10.43.0.10      <none>                                 53/UDP,53/TCP,9153/TCP       21h   k8s-app=kube-dns
kube-system   metrics-server   ClusterIP      10.43.247.192   <none>                                 443/TCP                      21h   k8s-app=metrics-server
kube-system   traefik          LoadBalancer   10.43.192.217   <external IPv4>,<external IPv6>        80:31148/TCP,443:30504/TCP   21h   app.kubernetes.io/instance=traefik,app.kubernetes.io/name=traefik

> kubectl get endpoints -A
NAMESPACE     NAME             ENDPOINTS                                  AGE
default       kubernetes       <external IPv4>:6443                       21h
kube-system   kube-dns         10.42.0.6:53,10.42.0.6:53,10.42.0.6:9153   21h
kube-system   metrics-server   10.42.0.2:4443                             21h
kube-system   traefik          10.42.0.11:8000,10.42.0.11:8443            21h

barsa-net avatar Sep 22 '22 17:09 barsa-net

@barsa-net I could reproduce this problem and I'm slowly getting convinced that you found a bug in upstream kubernetes. I don't understand why but this line https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/iptables/proxier.go#L1501 is called twice for a dual-stack service (traefik) when updating externalTrafficPolicy to Local, even though I can only see one event:

25032-Sep 23 17:15:18 mbuil-vm0 k3s[1722039]: time="2022-09-23T17:15:18Z" level=info msg="Event(v1.ObjectReference{Kind:\"Service\", Namespace:\"kube-system\", Name:\"traefik\", UID:\"44f91337-2f91-46ab-8379-c3e15dbb5b38\", APIVersion:\"v1\", ResourceVersion:\"614\", FieldPath:\"\"}): type: 'Normal' reason: 'AppliedDaemonSet' Applied LoadBalancer DaemonSet kube-system/svclb-traefik-44f91337"

I guess it might be doing it twice because of two IPs (IPv4 and IPv6), which is not correct. Then it tries to bind to the same port on the second iteration and fails.

Unfortunately, this is not something that can be fixed in k3s but in kubernetes upstream

manuelbuil avatar Sep 23 '22 17:09 manuelbuil

Closing as upstream issue - the messages seem concerning if you're poking through the logs but are fine.

brandond avatar Jan 07 '23 00:01 brandond

Hi,

was there ever a issue raised in upstream Kubernetes? I have this problem too, using Kubernetes and not k3s, and trying to figure out the current state of this problem.

Thanks :)

LittleFox94 avatar Apr 29 '24 09:04 LittleFox94

@LittleFox94 not on my end, I don't use "vanilla" Kubernetes and as it's stated above it's mostly an annoying log entry than a failure, at least in k3s .

barsa-net avatar Apr 30 '24 09:04 barsa-net

The issue is already reported upstream https://github.com/kubernetes/kubernetes/issues/114702 but with low priority as no issue with functionality https://github.com/kubernetes/kubernetes/issues/114702#issuecomment-1382364212

mboukhalfa avatar May 29 '24 15:05 mboukhalfa