k3s icon indicating copy to clipboard operation
k3s copied to clipboard

Unable to communicate between pods on different nodes if master node is behind NAT

Open lo00l opened this issue 2 years ago • 14 comments

Environmental Info: K3s Version:

k3s -v
k3s version v1.23.6+k3s1 (418c3fa8)
go version go1.17.5

Node(s) CPU architecture, OS, and Version:

Master:

uname -a
Linux ip-172-31-12-196 5.15.0-1008-aws #10-Ubuntu SMP Wed May 18 17:28:39 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Worker:

uname -a
Linux <hostname> 5.4.0-105-generic #119-Ubuntu SMP Mon Mar 7 18:49:24 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

1 master node, 1 worker node Describe the bug:

Steps To Reproduce:

  • Installed K3s: Install k3s on master node:
sudo curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="--tls-san <ip> --node-external-ip <ip> --kube-apiserver-arg external-hostname=<ip>" sh -

Install k3s on worker node:

sudo curl -sfL https://get.k3s.io | K3S_URL=https://<master ip>:6443 K3S_NODE_NAME=worker K3S_TOKEN=<token> INSTALL_K3S_EXEC='--node-label worker=true --flannel-iface eth0 --debug -v 3' sh -

Master node is behind NAT. It has public IP, but IP on eth0 interface is different (172.31.12.196).

I deploy 2 pods and services, each on every node:

apiVersion: v1
kind: Pod
metadata:
  name: echo-devel
  namespace: echo
  labels:
    app: echo-devel
spec:
  nodeSelector:
    worker: "true"
  containers:
    - name: echo
      image: ealen/echo-server
      ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: devel
  namespace: echo
spec:
  selector:
    app: echo-devel
  ports:
    - port: 80
      name: "80"
  type: NodePort
---
apiVersion: v1
kind: Pod
metadata:
  name: echo-master
  namespace: echo
  labels:
    app: echo-master
spec:
  nodeSelector:
    node-role.kubernetes.io/master: "true"
  containers:
    - name: echo
      image: ealen/echo-server
      ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: master
  namespace: echo
spec:
  selector:
    app: echo-master
  ports:
    - port: 80
      name: "80"
  type: NodePort

Then I execute the following command from container on master node:

telnet master.echo.svc.cluster.local 80
Connected to master.echo.svc.cluster.local
^C

telnet devel.echo.svc.cluster.local 80
^C

As you can see, it's unable to connect to pod on another (worker node).

Then I execute the following commands on worker node's pod:

nslookup devel.echo.svc.cluster.local
;; connection timed out; no servers could be reached 

It doesn;t connect to k3s DNS server. Expected behavior:

Pods can connect to each another, no matter they are on the same node or not. Actual behavior:

Pods can't connect to each other if they are of different nodes. Additional context / logs:

I tried to upgrade my iptables rules:

iptables -I INPUT 1 -i cni0 -s 10.42.0.0/16 -j ACCEPT
iptables -I INPUT 1 -i cni0 -s 10.43.0.0/16 -j ACCEPT

But it didn't help.

Master routes:

ip route
default via 172.31.0.1 dev eth0 proto dhcp src 172.31.12.196 metric 100
10.42.0.0/24 dev cni0 proto kernel scope link src 10.42.0.1
10.42.1.0/24 via 10.42.1.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.31.0.0/20 dev eth0 proto kernel scope link src 172.31.12.196 metric 100
172.31.0.1 dev eth0 proto dhcp scope link src 172.31.12.196 metric 100
172.31.0.2 dev eth0 proto dhcp scope link src 172.31.12.196 metric 100

Worker routes:

ip route
default via 172.31.1.1 dev eth0 proto dhcp src <worker-external-ip> metric 100
10.42.0.0/24 via 10.42.0.0 dev flannel.1 onlink
10.42.1.0/24 dev cni0 proto kernel scope link src 10.42.1.1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.19.0.0/16 dev br-9c77cd708ba2 proto kernel scope link src 172.19.0.1
172.31.1.1 dev eth0 proto dhcp scope link src <worker-external-ip> metric 100

I also tried to launch new cluster, where master node is not behind NAT (external IP = IP no eth0 interface), the worker node stayed the same, and everything worked fine. But I have to use AWS server which is behind NAT.

All firewalls (both on master and worker nodes) are disabled.

Backporting

  • [ ] Needs backporting to older releases

lo00l avatar Jun 15 '22 11:06 lo00l

1 master node, 1 worker node

Note that k3s has servers and agents, not masters and workers.

If you set --node-external-ip on a node, the other nodes need to be able to reach the node at that IP. From one of the agents, can you successfully curl -vks https://SERVER_EXTERNAL_IP:6443/ping ?

brandond avatar Jun 15 '22 19:06 brandond

Hi @brandond , thank you for the quick response.

Yes, sure, external node IP is reachable by the agent node.

curl -vks https://<ip>:6443/ping
*   Trying <ip>:6443...
* TCP_NODELAY set
* Connected to <ip> (<ip>) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: O=k3s; CN=k3s
*  start date: Jun 15 19:31:23 2022 GMT
*  expire date: Jun 15 19:31:23 2023 GMT
*  issuer: CN=k3s-server-ca@1655321483
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x55fdaa3898c0)
> GET /ping HTTP/2
> Host: <ip>:6443
> user-agent: curl/7.68.0
> accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
< HTTP/2 200
< content-type: text/plain
< content-length: 4
< date: Wed, 15 Jun 2022 19:32:17 GMT
<
* Connection #0 to host <ip> left intact
pong

Moreover, when I run kubectl get nodes I can see both nodes connected.

lo00l avatar Jun 15 '22 19:06 lo00l

OK, if that's working, but traffic is not passing within the cluster, it's probably related to the CNI (flannel+vxlan). Are you sure that the correct ports are open between all the nodes? If all of the nodes are not on the same network and inter-node cluster traffic needs to transit the internet, you may want to look at using wireguard instead of vxlan, as vxlan is insecure and not designed for use outside a controlled LAN environment.

Also note that iptables-based firewalls (ufw/firewalld) should be disabled, as the CNI projects themselves do not support operation with them enabled.

brandond avatar Jun 15 '22 19:06 brandond

All the firewalls on both nodes are disabled.

Cannot be reason of this behaviour that server node is behind NAT? When I setup the same cluster when server node is not behind NAT everything works well.

lo00l avatar Jun 15 '22 20:06 lo00l

if the NAT is not passing the VXLAN UDP packets correctly that could be part of it, but it's hard to say. Flannel just configures the in-kernel VXLAN encapsulation, the code for that is not in k3s itself.

brandond avatar Jun 15 '22 20:06 brandond

I think I am hitting the same issue when deploying a multi-node k3s on oracle cloud. Server and agent are all Ubuntu 20.04, neither flannel or calico works. For flannel the following command makes pods pingable briefly before the routing table's getting reverted back by k3s.

sudo ip r add 10.42.0.0/16 dev cni0

roylez avatar Aug 14 '22 00:08 roylez

I confirm such a bug. Spent three last days trying to run k3s server instance's behind internal tcp load balancer on gcp cloud. Server and agent nodes were in different subnetworks, but was reachable to each other. All firewalls were off, and i tried all possible backend's for flannel (only with ipsec i was able resolf dns in pods at least), also tried calico with no success.

T-helper avatar Sep 02 '22 20:09 T-helper

You might try using the wireguard flannel(flannel-native) backend. vxlan, host-gw, and udp are all unlikely to properly transit across the internet or other networks that might mangle the packets (as well as being wildly insecure for use across the internet).

brandond avatar Sep 02 '22 21:09 brandond

Yeah, I'm experiencing this issue with Oracle Cloud as well, and when I set up the wireguard, everything works perfectly.

Alehap avatar Sep 03 '22 03:09 Alehap

Hi, In my case, the master node is NATed, with a public IP, and the workers are NATed with NO incoming public IPs.

So I set up a wireguard on my master node, connected all the worker nodes to the master node, and then set up the k3s within my wireguard network; I also changed my flannel-backend to none since everything was running inside my handcrafted wireguard.

maybe it's more complicated than it should be. and for now, the connections between pods do not works.

What is the essential way to make it works? setting flannel-backend to wireguard is enough to make it works?

in my case, the workers are dispatched behind multiple DSL NATs around the globe.

moreover the K3S doc is offline

UrielCh avatar Sep 16 '22 05:09 UrielCh

moreover the K3S doc is offline

There was a temporary outage on the rancher website. This has been fixed. The new k3s docs are at https://k3s-io.github.io/docs/ pending a move to the main k3s.io domain.

I also changed my flannel-backend to none since everything was running inside my handcrafted wireguard.

That disables the flannel CNI entirely; leaving you to deploy your own CNI. Unless you are doing so, I would not use that setting.

You can probably get K3s working over an existing wireguard mesh, but we don't have any documentation on how to do so, as the preferred option is to let flannel do it for you. I would recommend not setting up your own wireguard tunnels, and instead using --flannel-backend=wireguard-native, along with setting --node-external-ip to the correct public address of each node so that they can connect to each other over the internet.

brandond avatar Sep 16 '22 19:09 brandond

Is the --flannel-backend=wireguard-native will work with worker nodes not being reachable with a node-external-ip? if the internal wireguard-native is a mesh configuration that will not work.

If I add some extra IP-fowarding, I need at least to choose the wireguard port for each worker. Is there an option for that? the closest option is --advertise-port, but it is not related to wireguard config.

the My ISP IPv6 config is broken, so I wish A could make it works in IPv4.

UrielCh avatar Sep 17 '22 04:09 UrielCh

Is the --flannel-backend=wireguard-native will work with worker nodes not being reachable with a node-external-ip?

I'm not an expert on wireguard, but I believe that it should work as long as one end is reachable at the external IP, and both nodes know their external IP. There is work to support NAT hole punching for situations where neither end has a port open, but I don't know what the status is of that.

There are some other issues out there where users have worked to get the vxlan and host-gw backends working over existing wireguard meshes, but it's not something we document or support. You might search out those threads, and check out the --flannel-iface option.

If I add some extra IP-fowarding, I need at least to choose the wireguard port for each worker.

You can use the --flannel-conf and --flanel-cni-conf to provide your own configuration files. See https://github.com/flannel-io/flannel/blob/master/Documentation/backends.md#wireguard for details on backend config options.

brandond avatar Sep 17 '22 05:09 brandond

I tried --flannel-backend=wireguard-native, but that failed too. Now I switch my master to a new Host that can see its public IP directly on its network interface. With that new configuration, it looks to work. I need more testing now; I will append my test result to this ticket later.

UrielCh avatar Sep 19 '22 11:09 UrielCh

My first cluser still, working fine, and I want to create a new one. I'm now trying to setup a master node in Oracle Cloud, and if did not work even with this setup:

export LOCAL_IP=10.0.0.XX; # if of the local network
export PUB_IP=XX.XX.XX.XX; # public IP
export INSTALL_K3S_EXEC="server --flannel-backend=wireguard-native \
    --tls-san ${PUB_IP} \
    --node-name=master-prod \
    --node-ip=${LOCAL_IP} \
    --bind-address=${LOCAL_IP} \
    --advertise-address=${PUB_IP} \
    --node-external-ip=${PUB_IP}"
curl -sfL https://get.k3s.io | sh -;

I do not know if node-ip should be the local IP or the public IP, the doc say: IP address to advertise for node

I used the Ubuntu 22.04 image, pre-configured with a restrictive firewall in /etc/iptables/rules.v4, I cleaned up these rules files, with a simple accept iptables.

UrielCh avatar Nov 01 '22 14:11 UrielCh

node-external-ip with wireguard-native works fine on oracle, and they are the only options needed. However, it does not work with latest k3s last time I checked. I am version pinning to v1.24.6+k3s1.

roylez avatar Nov 02 '22 01:11 roylez

Thx I going to try the previous version, for now, that failed with v1.25.3+k3s1 more tests, more scripts:

My setup script for oracle cloud is now:

#!/bin/bash
export EXTRA_TLS=additional.dns.for.CERT; # CHANGE ME
/usr/local/bin/k3s-uninstall.sh &>/dev/null; /usr/local/bin/k3s-agent-uninstall.sh &>/dev/null;
export LOCAL_IP=$(ip -4 a show dev enp0s3 | grep -oE 'inet [^/]+' | cut -d\  -f2);
export PUB_IP=$(dig +short myip.opendns.com @resolver1.opendns.com -4);
export INSTALL_K3S_VERSION=v1.25.3+k3s1; # failed
# export INSTALL_K3S_VERSION=v1.24.6+k3s1; # should pass
export INSTALL_K3S_EXEC="server --flannel-backend=wireguard-native \
    --flannel-external-ip \
    --tls-san ${EXTRA_TLS} --tls-san ${PUB_IP} \
    --node-ip=${LOCAL_IP} \
    --bind-address=${LOCAL_IP} \
    --advertise-address=${PUB_IP} \
    --node-name=master-prod \
    --node-external-ip=${PUB_IP}"
curl -sfL https://get.k3s.io | sh -;
sudo chown ${USER}: /etc/rancher/k3s/k3s.yaml;
echo
echo -e '\033[0;31mSETUP done\033[0m'
echo -e '\033[0;32mSetup nodes with:\033[0m'
echo
echo
echo '/usr/local/bin/k3s-uninstall.sh &> /dev/null; /usr/local/bin/k3s-agent-uninstall.sh &>/dev/null;'
echo "export INSTALL_K3S_VERSION=${INSTALL_K3S_VERSION};"
echo "export K3S_NODE_NAME=\$(hostname);"
echo "export K3S_TOKEN=$(sudo cat /var/lib/rancher/k3s/server/node-token);"
echo "export K3S_URL=https://${PUB_IP}:6443;";
export 'INSTALL_K3S_EXEC="--node-label adb=true";' # change ME TOO
echo 'curl -sfL https://get.k3s.io | sh -;'

-- updated --

UrielCh avatar Nov 02 '22 04:11 UrielCh

Can you clarify what exactly doesn't work?

brandond avatar Nov 02 '22 05:11 brandond

Can you clarify what exactly doesn't work?

Pods from different Node can not communicate together, in a full NATed setup.

I'm now preparing a simple hello works setup script

UrielCh avatar Nov 02 '22 05:11 UrielCh

You might try out the new flag that was recently added: https://github.com/k3s-io/k3s/pull/6321

brandond avatar Nov 02 '22 06:11 brandond

Demo ready: same init script, then:

kubectl apply -f https://raw.githubusercontent.com/UrielCh/dyn-ingress/main/demo.yml

This config contains:

  • Lots of comments :smile:
  • A DaemonSet of docker.io/traefik/whoami:v1.8
  • A strip-prefix Middleware for traefik
  • An empty Ingress with a dummy path /fake-url to make it valid.
  • A ServiceAccount / Role / RoleBinding to allow dyn-ingress to populate the dummy Ingress with routes to all docker.io/traefik/whoami:v1.8
  • My hand crafted with love dyn-ingress sources.

Once deployed, accessing with a browser to the k3s will list all nodes in an html page, or return a json list of pods if requested from a script.

The dyn-ingress pod, will output nice activity summary in its logs, with ANSI color, but kubectl logs look to drop all my colors 😞

Now let's try more test.

UrielCh avatar Nov 02 '22 10:11 UrielCh

:rainbow:--flannel-external-ip:rainbow: fix the issue ! Thx

UrielCh avatar Nov 02 '22 10:11 UrielCh