kubeone icon indicating copy to clipboard operation
kubeone copied to clipboard

Traffic seems to be routed via nodes' public IPs instead of private IPs

Open namelessvoid opened this issue 4 years ago • 36 comments

What happened:

I have a kubeone cluster set-up at Hetzner via the example terraform scripts which include a private network. The only change we have is to add worker pools for a list of datacenters:

variable "datacenters" {
  type = list(string)
  default = ["nbg1", "fsn1"]
}

output "kubeone_workers" {
  description = "Workers definitions, that will be transformed into MachineDeployment object"

  value = {
    for idx, datacenter in var.datacenters: 

    # following outputs will be parsed by kubeone and automatically merged into
    # corresponding (by name) worker definition
    "${var.cluster_name}-pool${idx + 1}" => {
      replicas = var.workers_replicas
      providerSpec = {
        sshPublicKeys   = [file(var.ssh_public_key_file)]
        operatingSystem = var.worker_os
        operatingSystemSpec = {
          distUpgradeOnBoot = false
        }
        cloudProviderSpec = {
          # provider specific fields:
          # see example under `cloudProviderSpec` section at:
          # https://github.com/kubermatic/machine-controller/blob/master/examples/hetzner-machinedeployment.yaml
          serverType = var.worker_type
          location   = datacenter
          image      = var.image
          networks = [
            hcloud_network.net.id
          ]
          # Datacenter (optional)
          # datacenter = ""
          labels = {
            "${var.cluster_name}-workers" = "pool1"
          }
        }
      }
    }
  }
}

The resulting nodes look like this:

# kubectl get nodes -o wide
NAME                             STATUS   ROLES                  AGE   VERSION   INTERNAL-IP   EXTERNAL-IP       OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
staging-control-plane-1          Ready    control-plane,master   29d   v1.20.6   192.168.0.3   195.201.XXX.XXX   Ubuntu 20.04.2 LTS   5.4.0-72-generic   docker://19.3.14
staging-control-plane-2          Ready    control-plane,master   29d   v1.20.6   192.168.0.5   162.55.XXX.XXX    Ubuntu 20.04.2 LTS   5.4.0-72-generic   docker://19.3.14
staging-control-plane-3          Ready    control-plane,master   29d   v1.20.6   192.168.0.4   195.201.XXX.XXX   Ubuntu 20.04.2 LTS   5.4.0-72-generic   docker://19.3.14
staging-pool1-5d679cf75-464fm    Ready    <none>                 16h   v1.20.6   192.168.0.9   195.201.XXX.XXX   Ubuntu 20.04.2 LTS   5.4.0-72-generic   docker://19.3.15
staging-pool2-84c786cf67-dxf9p   Ready    <none>                 17h   v1.20.6   192.168.0.7   162.55.XXX.XXX    Ubuntu 20.04.2 LTS   5.4.0-72-generic   docker://19.3.15

When I traceroute a kubernetes service (e.g. backend.default.svc.cluster.local) I see that the traffic is routed via the public IP of the nodes instead of the IP within the private network:

# kubectl run ubuntu --image ubuntu -- sleep infinity
# kubectl exec -it ubuntu -- bash
root@ubuntu:/# traceroute backend.default.svc.cluster.local
traceroute to backend.default.svc.cluster.local (10.110.120.137), 30 hops max, 60 byte packets
 1  static.XXX.XXX.55.162.clients.your-server.de (162.55.XXX.XXX)  0.252 ms  0.052 ms  0.015 ms
 2  172.31.1.1 (172.31.1.1)  11.795 ms  11.217 ms  11.723 ms
 3  11202.your-cloud.host (159.69.96.89)  0.272 ms  0.410 ms  0.379 ms
 4  * * *
 5  spine1.cloud2.fsn1.hetzner.com (213.239.225.41)  0.954 ms  1.292 ms  1.262 ms
 6  core23.fsn1.hetzner.com (213.239.239.137)  3.423 ms core23.fsn1.hetzner.com (213.239.239.125)  2.076 ms core24.fsn1.hetzner.com (213.239.239.133)  4.113 ms
 7  core11.nbg1.hetzner.com (213.239.245.225)  6.156 ms core11.nbg1.hetzner.com (213.239.203.125)  10.360 ms  6.085 ms^C

Where 162.55.XXX.XXX is the public IP of the node. I'd expect the traffic being sent to 192.168.0.7 instead. I confirmed on a GKE cluster and there it seems that traffic is routed via the private IPs.

As a consequence, if I apply a firewall which prevents access to the nodes' public IPs, the cluster networking becomes non-operational in a sense that DNS lookups no longer work and services cannot be reached.

What is the expected behavior:

In-cluster traffic should be routed via private IPs and not via public IPs. I should also be able to restrict public node IP access via firewall and the cluster should stay operational.

How to reproduce the issue:

I did not try it with a fresh install, but steps to reproduce should be:

  1. Install kubeone on Hetzner with the default terraform templates
  2. Create some default pod with a service (nginx should suffice)
  3. Create a second pod (e.g. ubuntu) and traceroute the service created in 2).

Anything else we need to know?

Information about the environment: KubeOne version (kubeone version): Cluster was created with kubeone 1.2.1 but was updated to 1.2.2 and then 1.2.3 recently. MachineDeployments have been restarted via https://docs.kubermatic.com/kubeone/master/cheat_sheets/rollout_machinedeployment/ Operating system: Ubuntu 20.04.2 LTS Provider you're deploying cluster on: Hetzner Operating system you're deploying on: MacOS

Hope you can help me with that! Thank you a lot!

namelessvoid avatar Jun 16 '21 07:06 namelessvoid

I'm not sure if cross datacenter traffic can be sent over the private IPs. I suppose that question should be directed at hetzner cloud themselves.

kron4eg avatar Jun 16 '21 09:06 kron4eg

OK, I've tried to create VMs in different DCs and they are capable to communicate to each other over the private networking.

kron4eg avatar Jun 16 '21 10:06 kron4eg

@namelessvoid can you please try to build kubeone using latest master and try it?

I'm getting different results

root@ubuntu:/# traceroute 10.244.7.2
traceroute to 10.244.7.2 (10.244.7.2), 30 hops max, 60 byte packets
 1  static.123.164.55.162.clients.your-server.de (162.55.164.123)  0.156 ms  0.066 ms  0.073 ms
 2  10.244.7.0 (10.244.7.0)  4.442 ms  4.234 ms  4.131 ms
 3  10.244.7.2 (10.244.7.2)  4.282 ms  4.042 ms  3.844 ms

where 10.244.7.2 is overlay IP of the pod running on the other datacenter.

kron4eg avatar Jun 16 '21 10:06 kron4eg

Maybe I'm getting it wrong but shouldn't the first hop be the virtual network IP of your node? 162.55.164.123 is the public IP, isn't it? Disclaimer: I'm not too deep into k8s networking 🙈

I'll try latest master as soon as I can (I'm a bit tied by releases right now).

namelessvoid avatar Jun 17 '21 12:06 namelessvoid

Just for completeness, I tried a fresh cluster installed with kubeone 1.2.3 and see these results:

 1  static.170.210.55.162.clients.your-server.de (162.55.210.170)  0.147 ms  0.033 ms  0.024 ms
 2  172.31.1.1 (172.31.1.1)  13.458 ms  13.301 ms  13.008 ms
 3  11685.your-cloud.host (195.201.67.143)  0.607 ms  0.478 ms  0.515 ms

Then I built kubeone from master and retried on another freshly installed cluster:

root@ubuntu:/# traceroute nginx.default.svc.cluster.local
traceroute to nginx.default.svc.cluster.local (10.103.180.121), 30 hops max, 60 byte packets
 1  static.97.89.201.138.clients.your-server.de (138.201.89.97)  0.062 ms  0.027 ms  0.022 ms
 2  172.31.1.1 (172.31.1.1)  14.458 ms  14.358 ms  14.322 ms
 3  12740.your-cloud.host (136.243.181.165)  0.512 ms  0.447 ms  0.395 ms

kubeone version for the self-built one shows

{
  "kubeone": {
    "major": "1",
    "minor": "2",
    "gitVersion": "v1.2.0-rc.0-65-gab496ef",
    "gitCommit": "ab496efdaa222e92f14a1d0cbe63149d57f8cc53",
    "gitTreeState": "",
    "buildDate": "2021-06-22T11:48:09+02:00",
    "goVersion": "go1.16.5",
    "compiler": "gc",
    "platform": "darwin/amd64"
  },
  "machine_controller": {
    "major": "1",
    "minor": "30",
    "gitVersion": "v1.30.0",
    "gitCommit": "",
    "gitTreeState": "",
    "buildDate": "",
    "goVersion": "",
    "compiler": "",
    "platform": "linux/amd64"
  }
}

Test setup:

$ kubectl run nginx --image nginx
$ kubectl expose pod nginx --port 80
$ kubectl run ubuntu --image ubuntu -- sleep infinity
$ kubectl exec -it ubuntu -- bash
  # apt update && apt install traceroute -y
  # traceroute nginx.default.svc.cluster.local

namelessvoid avatar Jun 22 '21 11:06 namelessvoid

Did a third test by installing the cluster from the example terraform files.

Kubeone manifest looks like this:

apiVersion: kubeone.io/v1beta1
kind: KubeOneCluster

versions:
  kubernetes: '1.20.6'

cloudProvider:
  hetzner: {}
  external: true

addons:
  enable: true
  path: "./addons"

For the test, ./addons was empty.

I tried both, a cluster with a single worker node and a cluster with two worker nodes. The traceroute results remain the same, traffic is routed via public IPs.

I'm attaching some screens from the networking of the Hetzner Cloud Console. This should be setup correctly, shouldn't it?

image image image

I'm happy for any ideas for further debugging! Thank you a lot! :)

namelessvoid avatar Jun 22 '21 12:06 namelessvoid

Ok, maybe I found something - sorry for not thinking about this earlier!

When I traceroute the pod IP as you did, @kron4eg, I also see the traffic using the overlay IP:

$ traceroute 10.244.8.36
traceroute to 10.244.8.36 (10.244.8.36), 30 hops max, 60 byte packets
 1  static.XXX.XXX.XXX.162.clients.your-server.de (162.XXX.XXX.XXX)  0.132 ms  0.033 ms  0.021 ms
 2  10.244.8.0 (10.244.8.0)  3.768 ms  3.641 ms  3.543 ms
 3  10-244-8-36.nginx.default.svc.cluster.local (10.244.8.36)  3.622 ms  3.497 ms  3.490 ms

I'm still confused, though, why the public IP shows up in the trace.

But when accessing the service exposing the very same pod, it seems to take the public route again:

$ traceroute 10.109.255.202
traceroute to 10.109.255.202 (10.109.255.202), 30 hops max, 60 byte packets
 1  static.XXX.XXX.XXX.162.clients.your-server.de (162.55.166.14)  0.080 ms  0.039 ms  0.022 ms
 2  172.31.1.1 (172.31.1.1)  10.880 ms  9.905 ms  10.592 ms
 3  11202.your-cloud.host (159.69.96.89)  0.447 ms  0.332 ms  0.320 ms
 4  * * *
 5  spine2.cloud2.fsn1.hetzner.com (213.239.225.45)  1.018 ms spine1.cloud2.fsn1.hetzner.com (213.239.225.41)  0.958 ms spine2.cloud2.fsn1.hetzner.com (213.239.225.45)  1.263 ms
 6  core23.fsn1.hetzner.com (213.239.239.137)  13.665 ms  2.714 ms core24.fsn1.hetzner.com (213.239.239.129)  4.106 ms
 7  core11.nbg1.hetzner.com (213.239.203.125)  7.735 ms core12.nbg1.hetzner.com (213.239.203.121)  10.383 ms core11.nbg1.hetzner.com (213.239.203.125)  16.566 ms
 ...

So maybe some setting for the service overlay is not correct?

@kron4eg Could you maybe retry this on your end to confirm this? Thank you a lot!

namelessvoid avatar Jun 28 '21 11:06 namelessvoid

I'll try to reproduce

kron4eg avatar Jun 28 '21 11:06 kron4eg

@namelessvoid I still can't replicate that behaviour (using master build). Could you please attach your manifests (workloads/services/etc)?

kron4eg avatar Jul 09 '21 13:07 kron4eg

@kron4eg Sorry for the late response, got some stuff in may way in between...

There is nothing special, I believe:

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: nginx
  name: nginx
  namespace: default
spec:
  containers:
  - image: nginx
    name: nginx
---
apiVersion: v1
kind: Service
metadata:
  labels:
    run: nginx
  name: nginx
  namespace: default
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    run: nginx
  type: ClusterIP

namelessvoid avatar Jul 21 '21 13:07 namelessvoid

I can confirm this issue.

kubectl get nodes -o wide
NAME                        STATUS   ROLES                  AGE   VERSION   INTERNAL-IP   EXTERNAL-IP      OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
t1-control-plane-1          Ready    control-plane,master   80m   v1.21.3   10.8.0.2      188.34.X.X    Ubuntu 20.04.2 LTS   5.4.0-77-generic   containerd://1.4.8
t1-pool1-54f9cd8694-drz4m   Ready    <none>                 77m   v1.21.3   10.8.0.3      162.55.X.X   Ubuntu 20.04.2 LTS   5.4.0-77-generic   containerd://1.4.8

Testing with the manifests @namelessvoid provided in their last post:

root@ubuntu:/# traceroute 10.244.1.2
traceroute to 10.244.1.2 (10.244.1.2), 30 hops max, 60 byte packets
 1  static.103.165.55.162.clients.your-server.de (162.55.X.X)  0.100 ms  0.030 ms  0.065 ms
 2  10-244-1-2.nginx.default.svc.cluster.local (10.244.1.2)  0.223 ms  0.063 ms  0.069 ms

The first hop (162.55.X.X) is the external IP of the node. That should be 10.8.0.3 instead.

EDIT: OK, I suppose it was a false alarm. Pods keep talking to each other even though I'm now blocking all external traffic to the nodes. I'm still confused that the external IP shows up in the traceroute, though.

Lykos153 avatar Jul 22 '21 16:07 Lykos153

The first hop (162.55.X.X) is the external IP of the node

Is own IP of the node. This IP is the default route for pods.

kron4eg avatar Jul 23 '21 05:07 kron4eg

Can we somehow configure the internal IP to be the node's IP? Yesterday I said

Pods keep talking to each other even though I'm now blocking all external traffic to the nodes.

but that is only true if I use the SDN firewall provided by Hetzner. When I use iptables on the nodes to block all incoming traffic via the interface eth0, the pods can't communicate anymore.

I'd actually like to be able to disable the public interface completely. Is that somehow feasible with kubeone?

Lykos153 avatar Jul 23 '21 08:07 Lykos153

@Lykos153 I support it can be achieved by using custom images.

kron4eg avatar Jul 23 '21 09:07 kron4eg

I can now say for sure that DNS traffic is still routed via the public interface. With all incoming public connections blocked, pods can reach each other via IP but not via service hostnames. Also, every request from pods to the internet has a ~5s delay due to DNS timeout. The cluster is not usable unless I open ports 9*53 on the public network. I'm gonna try to get rid of the public interface using a custom image as you suggested. The issue remains, however.

Lykos153 avatar Jul 27 '21 00:07 Lykos153

Any update here? We have the same issue.

We need to whitelist the public ip-ranges as trusted ip's is our ingress to make the proxy protocol to work.

ErwinSteffens avatar Oct 28 '21 10:10 ErwinSteffens

Same issue, makes firewalling horrible. Have manually patched kubeconfigs to use private IP... Maybe can override kubeadm args somewhere

alam0rt avatar Dec 04 '21 00:12 alam0rt

@alam0rt did it helped?

kron4eg avatar Dec 04 '21 06:12 kron4eg

@alam0rt did it helped?

It helps, but it gets overridden on upgrade as the kubeadm config is regenerated.

For the time being I am just adding the public IPs to the rules using


data "hcloud_servers" "nodes" {
  with_selector = "role=node"
}


locals {
  node_public_ipv4 = [for node in data.hcloud_servers.nodes.servers : join("/", [node.ipv4_address, "32"])]
} 

alam0rt avatar Jan 22 '22 23:01 alam0rt

The admin kubeconfig is generated using value from terraform output kubeone_api. By default this value is public IP of the kubeapi loadbalancer. I don't see if hcloud_load_balancer can give you internal IP.

output "kubeone_api" {
  description = "kube-apiserver LB endpoint"

  value = {
    endpoint = hcloud_load_balancer.load_balancer.ipv4
    apiserver_alternative_names = var.apiserver_alternative_names
  }
}

kron4eg avatar Jan 23 '22 06:01 kron4eg

The admin kubeconfig is generated using value from terraform output kubeone_api. By default this value is public IP of the kubeapi loadbalancer. I don't see if hcloud_load_balancer can give you internal IP.

output "kubeone_api" {
  description = "kube-apiserver LB endpoint"

  value = {
    endpoint = hcloud_load_balancer.load_balancer.ipv4
    apiserver_alternative_names = var.apiserver_alternative_names
  }
}

There definitely is a private IP that can be used. I'll give it a go soon and see what happens.

alam0rt avatar Jan 24 '22 08:01 alam0rt

So, it looks like you can use

  value = {
    endpoint = hcloud_load_balancer.load_balancer.network_ip
  }
}

network_ip is defined here: https://github.com/hetznercloud/terraform-provider-hcloud/blob/d6f4207b2b75b76e007bd08602e6dcbfb1740032/internal/loadbalancer/resource.go#L406

but is apparently undocumented!

alam0rt avatar Jan 24 '22 08:01 alam0rt

OK, having the INTERNAL IP as kube-api endpoint means that kubeconfigs for whole system will contain that IP. Including admin config. Kubeone will work around that, not issue (we always tunnel kube-apiserver requests via ssh).

However your local kubectl might have a problem, but worry not kubeone proxy to the rescue! kubeone proxy will create a pass through ssh-tunnel proxy, that kubectl can easily leverage with export HTTPS_PROXY=http://....

kron4eg avatar Jan 24 '22 09:01 kron4eg

Speaking of which, is there a good way to regenerate all of the kubeconfigs ? I have updated the terraform output and ran kubeone apply --manifest kubeone.yaml -t new.json but I don't think anything is updated. Maybe I need to force upgrade?

alam0rt avatar Jan 24 '22 09:01 alam0rt

No, I don't think so it's possible, at least no under kubeadm. You'd need to create a new cluster.

kron4eg avatar Jan 24 '22 09:01 kron4eg

Damn! New cluster it is I guess.

alam0rt avatar Jan 24 '22 10:01 alam0rt

I mean, it can be done manually, but it's highly possible to kill your cluster. But if you'd like to try, here's how:

  • You'd need to regenerate certificates for kube-apiserver, with new SAN list that will include internal IP of the loadbalancer.
  • Then replace all the kubelet's kubeconfigs to point to this new LB
  • Then replace kubeproxy config in the configMap and restart all the kube-proxies across the cluster and pray the cluster is not dead after this

kron4eg avatar Jan 24 '22 11:01 kron4eg

But I highly recommend not doing this in the cluster that has anything valuable running under it.

kron4eg avatar Jan 24 '22 11:01 kron4eg

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubermatic-bot avatar May 17 '22 00:05 kubermatic-bot

/remove-lifecycle stale Docs are still pending.

xmudrii avatar May 17 '22 06:05 xmudrii