kube-router icon indicating copy to clipboard operation
kube-router copied to clipboard

Pods with hostNetwork=true can't connect to Kube API Server

Open ogallart23 opened this issue 1 year ago • 1 comments

What happened?

When I deploy kube-router with all features (without kube-proxy) and I deploy traefik with hostNetwork=true, the pod can't reach https://10.96.0.1:443/versions with timeouts exception.

What did you expect to happen?

When the pod is deployed with hostNetwork=true, it can reach the Kube API server. This works disabling service proxy in kube-router and enabling kube-proxy, but I want to use only kube-router.

How can we reproduce the behavior you experienced? Steps to reproduce the behavior:

  1. Step 1

Create a k8s cluster (in my case with k0s)

  1. Step 2

Deploy kube-router with all features

  1. Step 3

Deploy traefik with hostNetwork=true. It can't reach Kube API Service

**Screenshots / Architecture Diagrams / Network Topologies ** If applicable, add those here to help explain your problem.

Traefik logs:

k0s kubectl logs adparts-adgest-traefik-ff44b5bdb-h544b -n adparts
time="2024-04-27T18:10:09Z" level=info msg="Configuration loaded from flags."
time="2024-04-27T18:10:21Z" level=error msg="Error watching kubernetes events: could not retrieve server version: Get \"https://10.96.0.1:443/version\": net/http: TLS handshake timeout" providerName=kubernetes
time="2024-04-27T18:10:22Z" level=error msg="Provider connection error: could not retrieve server version: Get \"https://10.96.0.1:443/version\": net/http: TLS handshake timeout; retrying in 519.955504ms" providerName=kubernetes
time="2024-04-27T18:10:52Z" level=error msg="Error watching kubernetes events: could not retrieve server version: Get \"https://10.96.0.1:443/version\": dial tcp 10.96.0.1:443: i/o timeout" providerName=kubernetes
time="2024-04-27T18:10:53Z" level=error msg="Provider connection error: could not retrieve server version: Get \"https://10.96.0.1:443/version\": dial tcp 10.96.0.1:443: i/o timeout; retrying in 390.195589ms" providerName=kubernetes
time="2024-04-27T18:11:24Z" level=error msg="Error watching kubernetes events: could not retrieve server version: Get \"https://10.96.0.1:443/version\": dial tcp 10.96.0.1:443: i/o timeout" providerName=kubernetes
time="2024-04-27T18:11:25Z" level=error msg="Provider connection error: could not retrieve server version: Get \"https://10.96.0.1:443/version\": dial tcp 10.96.0.1:443: i/o timeout; retrying in 468.528865ms" providerName=kubernetes

** System Information (please complete the following information):**

  • Kube-Router Version (kube-router --version): v2.1.0
  • Kube-Router Parameters: all the default params in all-features-daemonset.yaml
  • Kubernetes Version (kubectl version) : 1.29
  • Cloud Type: On Premise
  • Kubernetes Deployment Type: K0S
  • Kube-Router Deployment Type: DaemonSet
  • Cluster Size: 4 nodes

** Logs, other output, metrics ** Please provide logs, other kind of output or observed metrics here.

Additional context

As I can see in iptables, kube-proxy creates KUBE-SERVICES chain, and allows in there 10.96.0.1 in port 443, but kube-proxy hasn't got some similar chain /rule

Thanks in advance

ogallart23 avatar Apr 27 '24 07:04 ogallart23

I don't run k0s so it might be a while before I can find time to setup an environment and test it myself.

However, it sounds like kube-router may not be starting, and as such it is not creating the kube-apiserver ClusterIP that traffik needs. I would imagine that this would most likely come about from kube-router not being able to talk to the kube-apiserver.

Are you able to see any logs from kube-router?

aauren avatar May 12 '24 23:05 aauren

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Jun 12 '24 02:06 github-actions[bot]

This issue was closed because it has been stale for 5 days with no activity.

github-actions[bot] avatar Jun 17 '24 02:06 github-actions[bot]