microk8s icon indicating copy to clipboard operation
microk8s copied to clipboard

Change the internal IP

Open ArseniiPetrovich opened this issue 3 years ago • 41 comments

Can I somehow change the IP that the node is using as internal IP? My nodes has an VPN interconnection, and it seems they communicate well through the VPN. However when it comes to setting up addons like DNS I'm having a "No route to host" issue. I tracked it down to microk8s trying to use the "Internal IPs" of nodes instead of using their VPN IPs.

ArseniiPetrovich avatar Jul 03 '21 01:07 ArseniiPetrovich

@ArseniiPetrovich could you provide an example with IPs in your setup?

Normally the pods get an IP and the CNI and kube-proxy makes sure the communication is established among them. At which point do you get the "No route to host" error?

ktsakalozos avatar Jul 05 '21 09:07 ktsakalozos

https://drive.google.com/drive/folders/18GFyEQ9gf6seLBfM72rUbdBQGuUQ-PXp?usp=sharing

Here, 3 tarballs from all our nodes. Would that answer your question? I want clusters to connect with 10.11.11.x addresses, while they connect with their primary addresses, either internal or external ones.

ArseniiPetrovich avatar Jul 05 '21 10:07 ArseniiPetrovich

So I get an error when doing mk8s enable/disable dns

ArseniiPetrovich avatar Jul 05 '21 10:07 ArseniiPetrovich

FYI, one of the nodes is currently out of cluster, I'm testing different options to fix this, but this does not work with 2 nodes either.

ArseniiPetrovich avatar Jul 05 '21 11:07 ArseniiPetrovich

I just ran into the same issue attempting this process:

  1. Installed private interfaces on all Ubuntu servers using 10.99.x.x addresses (using Netmaker though this shouldn't matter)
  2. Installed MicroK8s on top
  3. Used the add-node command and selected the private interface
  4. Ran the join command on the next node
  5. repeat steps 3-4 on all nodes
  6. Install CoreDNS (microk8s enable dns)

Enabling DNS Applying manifest serviceaccount/coredns created configmap/coredns created deployment.apps/coredns created service/kube-dns created clusterrole.rbac.authorization.k8s.io/coredns created clusterrolebinding.rbac.authorization.k8s.io/coredns created Restarting kubelet Adding argument --cluster-domain to nodes. Configuring node 182.166.77.151 Failed to reach node. HTTPSConnectionPool(host='182.166.77.151', port=25000): Max retries exceeded with url: /cluster/api/v1.0/configure (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2eca145a58>: Failed to establish a new connection: [Errno 110] Connection timed out',)) At this point it fails trying to reach the first node. Running "microk8s kubectl get nodes -o wide" I see that all nodes are using default interface IP's as their "internal IP" and no "external IP" is set. The first node that the CoreDNS addon is attempting to reach should be reached over 10.99.0.2, but it is using 182.166.77.151, the default address for that VM.

So to clarify, the issue is that the "internal IP" of the node is using the default interface rather than the private network IP. Is there any way to specify the advertised IP of the node when installing (or post-install)? I see there is an option to use the loopback address but that seems to be it.

afeiszli avatar Jul 06 '21 19:07 afeiszli

Update: to set internal IP's you can run the following (run as root on each node):

echo --node-ip=$(ip address show dev YOUR_PRIVATE_INTERFACE | grep 'inet ' | awk -F ' ' '{print $2}' | sed "s//24//g") >> /var/snap/microk8s/current/args/kubelet; microk8s stop; microk8s start

Replacing YOUR_PRIVATE_INTERFACE with the private interface name. Idk if this is correct or causes any different issues, but it at least sets the internal IP correctly.

afeiszli avatar Jul 06 '21 20:07 afeiszli

@ktsakalozos Is there a valid way to do this? I attempted the following to solve my issue:

  1. Hardcode the IP in the Calico config (/var/snap/microk8s/current/args/cni-network/cni.yaml)
  2. Hardcode the IP in the API-server config by setting --advertise-address in the /var/snap/microk8s/current/args/kube-apiserver
  3. Hardcode the IP in the Kubelet config by setting --node-ip in the /var/snap/microk8s/current/args/kubelet

It worked when I started the node, but when I tried to join these nodes together it went completely nowhere with the following error logs:

Jul  6 23:46:39 one microk8s.daemon-kubelite[109803]: E0706 23:46:39.564207  109803 available_controller.go:508] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.152.183.165:443/apis/metrics.k8s.io/v1beta1: Get "https://10.152.183.165:443/apis/metrics.k8s.io/v1beta1": dial tcp 10.152.183.165:443: connect: no route to host
Jul  6 23:46:39 one microk8s.daemon-kubelite[109803]: E0706 23:46:39.633782  109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul  6 23:46:39 one microk8s.daemon-kubelite[109803]: E0706 23:46:39.734625  109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul  6 23:46:39 one microk8s.daemon-kubelite[109803]: E0706 23:46:39.834942  109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul  6 23:46:39 one microk8s.daemon-kubelite[109803]: E0706 23:46:39.936713  109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul  6 23:46:40 one microk8s.daemon-kubelite[109803]: E0706 23:46:40.037661  109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul  6 23:46:40 one microk8s.daemon-kubelite[109803]: E0706 23:46:40.138446  109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul  6 23:46:40 one microk8s.daemon-kubelite[109803]: E0706 23:46:40.239243  109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul  6 23:46:40 one microk8s.daemon-kubelite[109803]: E0706 23:46:40.339734  109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul  6 23:46:40 one microk8s.daemon-kubelite[109803]: E0706 23:46:40.440353  109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul  6 23:46:40 one microk8s.daemon-kubelite[109803]: E0706 23:46:40.541775  109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul  6 23:46:40 one microk8s.daemon-kubelite[109803]: E0706 23:46:40.542045  109803 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://127.0.0.1:16443/api/v1/services?limit=500&resourceVersion=0": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "10.152.183.1")

ArseniiPetrovich avatar Jul 06 '21 20:07 ArseniiPetrovich

@afeiszli check my answer below. It works for one node, but when you try to join these nodes together - it breakes the whole cluster.

ArseniiPetrovich avatar Jul 06 '21 20:07 ArseniiPetrovich

Jul  6 23:59:59 one microk8s.daemon-kubelite[134568]: E0706 23:59:59.270000  134568 kubelet_node_status.go:586] "Failed to set some node status fields" err="failed to validate nodeIP: node IP: \"10.11.11.5\" not found in the host's network interfaces" node="one.us.vpc"
Jul  7 00:00:00 one microk8s.daemon-kubelite[134568]: E0707 00:00:00.876002  134568 available_controller.go:508] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.152.183.165:443/apis/metrics.k8s.io/v1beta1: Get "https://10.152.183.165:443/apis/metrics.k8s.io/v1beta1": dial tcp 10.152.183.165:443: connect: no route to host

It actually worked on the second try, but still put some weird logs...

ArseniiPetrovich avatar Jul 06 '21 21:07 ArseniiPetrovich

Realized that for some reason after joining cluster --node-ip in the Kubelet file of joining node changes to the IP of the node where it joins. For example: node 1 (ip 1): microk8s add-node node 2 (ip 2): microk8s join ... node 2 get the ip1 for in kubelet config

ArseniiPetrovich avatar Jul 06 '21 21:07 ArseniiPetrovich

Okay, I were able to join all the nodes and then manually override IP addresses for them. However when trying to run some pods I'm getting the following error:


Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b9b97afc5e9b38f8dd320c90e1fe96bb6268ef46df376ff694730718deea03b2": error getting ClusterInformation: Get https://[10.152.183.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.152.183.1:443: connect: no route to host

I flushed IP Tables as suggested in other issues. Still getting the same error, but with timeouts instead of no route from time to time. The weirdest thing around all that is that pods are starting occasionally, so it's kind of a bug that sometimes is here and sometimes isn't. But 99% of time it is there.

ArseniiPetrovich avatar Jul 06 '21 21:07 ArseniiPetrovich

[from the Netmaker side]: Have you checked with a simple ping to make sure nodes are still able to talk to each other over the private addresses (you can also run "wg show" and check the last handshake)? If they are not, it may be necessary to restart the network interface after an iptables flush depending on the rules that were put in place, which can be done with "netclient pull -n " on each node. Also make sure ip forwarding is enabled.

afeiszli avatar Jul 07 '21 12:07 afeiszli

Can you make sure (probably by editing the /etc/hosts on each node) that the hostname of each node resolves to the IP on the VPN?

ktsakalozos avatar Jul 08 '21 20:07 ktsakalozos

This fixed the ClusterIP issue for me. For posterity, a quick hack to run on each node, that will take the all node hostnames and advertised addresses, and insert them into /etc/hosts:

echo "$(microk8s kubectl get nodes -o wide | awk 'NR>1{ print $6, $1 }')" >> /etc/hosts

afeiszli avatar Jul 08 '21 20:07 afeiszli

@afeiszli Did this solved this timeout issue we discussed today? Because it didn't help me much, to be honest.

ArseniiPetrovich avatar Jul 08 '21 22:07 ArseniiPetrovich

@ArseniiPetrovich, @ktsakalozos mentioned that hostnames may need to be reachable before running the cluster install. I've reinstalled by first setting up hostnames. I still recieved a timeout, but it appears to be better at least for now.

afeiszli avatar Jul 09 '21 16:07 afeiszli

I believe I've found a key piece of this. If you do a "kubectl describe svc kubernetes -n default", you will see the endpoints are still pointing to the public IP's. To fix this you must stop each node (microk8s stop), and edit "/var/snap/microk8s/current/args/kube-apiserver", adding in the following line: --advertise-address=

When the node comes back up, the endpoint should be correct. This fixed one of my remaining timeout errors. Will leave cluster up for a while to see if other errors are encountered but network seems to be behaving now.

afeiszli avatar Jul 09 '21 16:07 afeiszli

@afeiszli When I did this last time I received an error about certificates, when new nodes were not able to join the Microk8s server. Can you try under this setup adding new node?

ArseniiPetrovich avatar Jul 12 '21 21:07 ArseniiPetrovich

@afeiszli @ktsakalozos Small friendly ping here :)

ArseniiPetrovich avatar Jul 16 '21 11:07 ArseniiPetrovich

Same problem here, incredible that so many kubernetes distributions have the same problem. I only could get it to work with K3S. It is so simples to provide some --node-ip flag or even respect the interface used on the first node to join.

dannyfranca avatar Aug 28 '21 14:08 dannyfranca

Just for the issue record, my use case is: I have VPSs in differente cloud providers and bare metals with residential IPs. So I don't have an private network in common. To solve this, I'm using Wireguard to wrap all machines in same private network (interface wg0).

What I expect: every nodes joining through IP provided by wg0 interface having internal IP described in wg0. What I got: the main Ubuntu interface's IP

Any way to make it happen?

I just need every nodes talk each other through a choosed interface or event that one used on join command.

dannyfranca avatar Aug 28 '21 14:08 dannyfranca

Thanks to everyone providing bits of advice in this thread.

I'm also using microk8s to run a cluster where every node is running in a different VPS from a different provider. The VPSs are on a tinc network together, forming subnet 10.x.y.z/24. So the interface I want to work on (tun1) is not the default interface.

This is what worked for me:

On every node (including the master(s)):

  1. microk8s stop (Stop all nodes before changing configuration files)
  2. Get the VPN IP of the node, e.g. 10.x.y.z. Command ip a show dev tun1 will show info for interface tun1.
  3. Add this to the bottom of /var/snap/microk8s/current/args/kubelet:
--node-ip=10.x.y.z
  1. Add this to the bottom of /var/snap/microk8s/current/args/kube-apiserver:
--advertise-address=10.x.y.z
  1. microk8s start

Now I see the correct values in the INTERNAL-IP column with microk8s kubectl get nodes -o wide.

rudolfbyker avatar Oct 25 '21 12:10 rudolfbyker

Realized that for some reason after joining cluster --node-ip in the Kubelet file of joining node changes to the IP of the node where it joins. For example: node 1 (ip 1): microk8s add-node node 2 (ip 2): microk8s join ... node 2 get the ip1 for in kubelet config

I'm also seeing this. I have to change /var/snap/microk8s/current/args/kubelet again after joining to make the changes stick. Obviously a bug!

rudolfbyker avatar Oct 25 '21 18:10 rudolfbyker

Hello all,

I have the same issue here. My workers have two interfaces and microk8s.join gets the "secondary" one. I try several comments but not one works. Apparently, kubelet ignores the --node-ip flag, and the BGP IP detection method in cni-network/cni.yaml file.

In my case, the ubuntu1 is a simple worker.

~$ kubectl get nodes -o wide
NAME      STATUS   ROLES    AGE     VERSION                    INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION     CONTAINER-RUNTIME
ubuntu1   Ready    <none>   15m     v1.23.1-2+b7120affd6631a   10.162.X.Y   <none>        Ubuntu 20.04.3 LTS   5.4.0-1050-raspi   containerd://1.5.7
dellemc   Ready    <none>   3h24m   v1.23.1-2+9e8d1bf4219080   172.30.248.86   <none>        Ubuntu 20.04.3 LTS   5.4.0-89-generic   containerd://1.5.7

and it has the following, routes:

# sudo ip route list
default via 169.254.66.80 dev eth0 proto static 
default via 10.162.16.1 dev wlan0 proto dhcp src 10.162.X.Y metric 600 
...

I follow this thread and the related ones but I could not find any solution. I appreciate some kind of help with respect to this configuration. Best

wisaaco avatar Jan 24 '22 16:01 wisaaco

Oly k3s solves that problem. incredible how this simple use case is a huge problem across many frameworks.

dannyfranca avatar Jan 24 '22 16:01 dannyfranca

Hi @wisaaco

This is most likely related to having two default interfaces on your machines. Can you elaborate on the steps that were taken when creating the cluster? (That is, the microk8s add-node and microk8s join commands that you used to form the cluster.).

Please also include the contents of /var/snap/microk8s/current/var/kubernetes/backend/cluster.yaml). This will help with tracking down why this happens. Thanks!

neoaggelos avatar Jan 24 '22 16:01 neoaggelos

Hi @neoaggelos ,

In one of the attempts, I tried to remove one default interface in a fresh installation but the snap.microk8s.daemon-kubelite service did not start.

In any case, I follow these steps:

  • In the master:
$ microk8s add-node
From the node you wish to join to this cluster, run the following:
microk8s join 172.30.248.86:25000/587970307c112959444429819caa0abc/e31414f43733

Use the '--worker' flag to join a node as a worker not running the control plane, eg:
microk8s join 172.30.248.86:25000/587970307c112959444429819caa0abc/e31414f43733 --worker

If the node you are adding is not reachable through the default interface you can use one of the following:
microk8s join 172.30.248.86:25000/587970307c112959444429819caa0abc/e31414f43733
microk8s join 172.17.0.1:25000/587970307c112959444429819caa0abc/e31414f43733
microk8s join 192.168.49.1:25000/587970307c112959444429819caa0abc/e31414f43733
microk8s join 172.18.0.1:25000/587970307c112959444429819caa0abc/e31414f43733
microk8s join fc00:f853:ccd:e793::1:25000/587970307c112959444429819caa0abc/e31414f43733

My master also has to interfaces:

2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 2c:ea:7f:e0:17:70 brd ff:ff:ff:ff:ff:ff
    inet 172.30.248.86/22 brd 172.30.251.255 scope global dynamic eno1
3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 2c:ea:7f:e0:17:71 brd ff:ff:ff:ff:ff:ff
    inet 169.254.66.80/16 brd 169.254.255.255 scope global eno2
  • And from the output, I chose the next command to run in the worker: microk8s join 172.30.248.86:25000/aa48a0e60504b58ab9a0d6191206f1ae/e31414f43733 --worker If I choose this microk8s join 192.168.49. ..., the worker connects to the master but it is not visible inside the master.

Note all devices ping each other. In the master, eno1 provides internet access and eno2 is local. The workers are connected via eno2 to the master.

The content of /var/snap/microk8s/current/var/kubernetes/backend/cluster.yaml is:

$ cat /var/snap/microk8s/current/var/kubernetes/backend/cluster.yaml
- Address: 172.30.248.86:19001
  ID: 3297041220608546238
  Role: 0

Thanks!

wisaaco avatar Jan 25 '22 08:01 wisaaco

I'm having issues after deleting a second node from a HA cluster without "leaving" from cluster. It occurs because I'm using the Kubernetes autoscaler.

Behavior: After second node be removed, the master stuck forever.

Notes:

  • I have two interfaces, one public and another internal (vpn)
  • I tried remove the second node reference from /var/snap/microk8s/current/var/kubernetes/backend/*.yaml files manually
  • I tried restart the master node
  • My internal network (vpc) is working.

I had any success.

Thanks an advance by the help.

inspection-report-20220423_054452.tar.gz

ephillipe avatar Apr 23 '22 05:04 ephillipe

I found a way to monkey patch the network routes by basically changing the default interface. However this should not be considered as a solution since it BREAKS SSH.

Supposing this is the output of route -n

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.255.255.1    0.0.0.0         UG    0      0        0 ens192
10.1.68.128     10.1.68.128     255.255.255.192 UG    0      0        0 vxlan.calico
10.152.183.0    192.168.1.1     255.255.255.0   UG    0      0        0 ens224
10.255.255.1    0.0.0.0         255.255.255.255 UH    0      0        0 ens192
192.168.1.0     0.0.0.0         255.255.255.0   U     0      0        0 ens224
# Remove the default WAN interface (ens192 in my case)
sudo route del -net 0.0.0.0 gw 10.255.255.1 netmask 0.0.0.0 dev ens192 => WILL BREAK SSH

# Change the default interface to redirect default traffic to LAN interface (ens224 in my case)
sudo ip route add default via 192.168.1.1 dev ens224

If you did this on a server and want to get SSH back, you have to run this from a KVM console. (I was there Gandalf)

# Set rule to redirect default traffic to ens192 WAN interface (Will restore SSH)
sudo ip route add default via 10.255.255.1 dev ens192

usersina avatar May 03 '22 18:05 usersina

Realized that for some reason after joining cluster --node-ip in the Kubelet file of joining node changes to the IP of the node where it joins. For example: node 1 (ip 1): microk8s add-node node 2 (ip 2): microk8s join ... node 2 get the ip1 for in kubelet config

I'm also seeing this. I have to change /var/snap/microk8s/current/args/kubelet again after joining to make the changes stick. Obviously a bug!

I tried experimenting with the --address option in /var/snap/microk8s/current/args/kubelet set to the address of the interface I want to use (192.168.1.0 for example) but it didn't work out. I also tried out --bind-address instead of --advertise-address but no luck there too.

The reason I'm trying that out is that if you have dns enabled and you want to join a node without changing the node ip first, it always timeouts...

usersina avatar May 11 '22 10:05 usersina