k3s
k3s copied to clipboard
Service with externalTrafficPolicy: Local in a Dual Stack cluster issues warnings
Environmental Info: K3s Version: v1.24.4+k3s1 (c3f830e9)
Node(s) CPU architecture, OS, and Version: x86_64 Ubuntu 22.04.1 LTS (Linux Kernel 5.15.0-48-generic)
Cluster Configuration: 1 server
Describe the bug:
Services with externalTrafficPolicy: Local in a Dual Stack (IPv4/IPv6) cluster results in HealthCheck NodePort bind issuing warning in events.
Expected behavior: No errors or warnings by suspected multiple socket bind
Actual behavior: There are errors logged suggesting multiple socket bind
Sep 21 18:45:32 ganymede k3s[813]: E0921 18:45:32.496643 813 service_health.go:141] "Failed to start healthcheck" err="listen tcp :30458: bind: address already in use" node="<node-name>" service="kube-system/traefik" port=30458
Additional context / logs:
> kubectl get events
NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE
kube-system 7m Warning FailedToStartServiceHealthcheck service/traefik node <node-name> failed to start healthcheck "kube-system/traefik" on port 30458: listen tcp :30458: bind: address already in use
> kubectl describe service traefik
Name: traefik
Namespace: kube-system
Labels: app.kubernetes.io/instance=traefik
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=traefik
helm.sh/chart=traefik-10.19.300
Annotations: meta.helm.sh/release-name: traefik
meta.helm.sh/release-namespace: kube-system
Selector: app.kubernetes.io/instance=traefik,app.kubernetes.io/name=traefik
Type: LoadBalancer
IP Family Policy: PreferDualStack
IP Families: IPv4,IPv6
IP: 10.43.210.123
IPs: 10.43.210.123,2001:cafe:42:1::18ea
LoadBalancer Ingress: <external IPv4>, <external IPv6>
Port: web 80/TCP
TargetPort: web/TCP
NodePort: web 30754/TCP
Endpoints: 10.42.1.17:8000
Port: websecure 443/TCP
TargetPort: websecure/TCP
NodePort: websecure 31608/TCP
Endpoints: 10.42.1.17:8443
Session Affinity: None
External Traffic Policy: Local
HealthCheck NodePort: 30458
Events: <none>
> ss -nlpt
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 4096 *:30458 *:* users:(("k3s-server",pid=813,fd=322))
Steps To Reproduce:
- Installed k3s with the following
/etc/rancher/k3s/config.yaml
node-name: <node-name>
node-ip: <internal IPv4, ULA IPv6>
node-external-ip: <external IPv4, external IPv6>
cluster-cidr: 10.42.0.0/16,2001:cafe:42:0::/56
service-cidr: 10.43.0.0/16,2001:cafe:42:1::/112
kubelet-arg:
- "kube-reserved=cpu=100m,memory=256Mi,ephemeral-storage=1Gi"
- "system-reserved=cpu=100m,memory=256Mi,ephemeral-storage=5Gi"
resolv-conf: /etc/resolv-kubelet.conf
- Applied the following manifest:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: traefik
namespace: kube-system
spec:
valuesContent: |-
deployment:
kind: DaemonSet
image:
name: traefik
tag: 2.8.3
ingressRoute:
dashboard:
enabled: false
service:
spec:
externalTrafficPolicy: Local
Can you show the output of kubectl get services -A -o wide and kubectl get endpoints -A please?
> kubectl get services -A -o wide
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 21h <none>
kube-system kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP,9153/TCP 21h k8s-app=kube-dns
kube-system metrics-server ClusterIP 10.43.247.192 <none> 443/TCP 21h k8s-app=metrics-server
kube-system traefik LoadBalancer 10.43.192.217 <external IPv4>,<external IPv6> 80:31148/TCP,443:30504/TCP 21h app.kubernetes.io/instance=traefik,app.kubernetes.io/name=traefik
> kubectl get endpoints -A
NAMESPACE NAME ENDPOINTS AGE
default kubernetes <external IPv4>:6443 21h
kube-system kube-dns 10.42.0.6:53,10.42.0.6:53,10.42.0.6:9153 21h
kube-system metrics-server 10.42.0.2:4443 21h
kube-system traefik 10.42.0.11:8000,10.42.0.11:8443 21h
@barsa-net I could reproduce this problem and I'm slowly getting convinced that you found a bug in upstream kubernetes. I don't understand why but this line https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/iptables/proxier.go#L1501 is called twice for a dual-stack service (traefik) when updating externalTrafficPolicy to Local, even though I can only see one event:
25032-Sep 23 17:15:18 mbuil-vm0 k3s[1722039]: time="2022-09-23T17:15:18Z" level=info msg="Event(v1.ObjectReference{Kind:\"Service\", Namespace:\"kube-system\", Name:\"traefik\", UID:\"44f91337-2f91-46ab-8379-c3e15dbb5b38\", APIVersion:\"v1\", ResourceVersion:\"614\", FieldPath:\"\"}): type: 'Normal' reason: 'AppliedDaemonSet' Applied LoadBalancer DaemonSet kube-system/svclb-traefik-44f91337"
I guess it might be doing it twice because of two IPs (IPv4 and IPv6), which is not correct. Then it tries to bind to the same port on the second iteration and fails.
Unfortunately, this is not something that can be fixed in k3s but in kubernetes upstream
Closing as upstream issue - the messages seem concerning if you're poking through the logs but are fine.
Hi,
was there ever a issue raised in upstream Kubernetes? I have this problem too, using Kubernetes and not k3s, and trying to figure out the current state of this problem.
Thanks :)
@LittleFox94 not on my end, I don't use "vanilla" Kubernetes and as it's stated above it's mostly an annoying log entry than a failure, at least in k3s .
The issue is already reported upstream https://github.com/kubernetes/kubernetes/issues/114702 but with low priority as no issue with functionality https://github.com/kubernetes/kubernetes/issues/114702#issuecomment-1382364212