microk8s
microk8s copied to clipboard
Change the internal IP
Can I somehow change the IP that the node is using as internal IP? My nodes has an VPN interconnection, and it seems they communicate well through the VPN. However when it comes to setting up addons like DNS I'm having a "No route to host" issue. I tracked it down to microk8s trying to use the "Internal IPs" of nodes instead of using their VPN IPs.
@ArseniiPetrovich could you provide an example with IPs in your setup?
Normally the pods get an IP and the CNI and kube-proxy makes sure the communication is established among them. At which point do you get the "No route to host" error?
https://drive.google.com/drive/folders/18GFyEQ9gf6seLBfM72rUbdBQGuUQ-PXp?usp=sharing
Here, 3 tarballs from all our nodes. Would that answer your question?
I want clusters to connect with 10.11.11.x
addresses, while they connect with their primary addresses, either internal or external ones.
So I get an error when doing mk8s enable/disable dns
FYI, one of the nodes is currently out of cluster, I'm testing different options to fix this, but this does not work with 2 nodes either.
I just ran into the same issue attempting this process:
- Installed private interfaces on all Ubuntu servers using 10.99.x.x addresses (using Netmaker though this shouldn't matter)
- Installed MicroK8s on top
- Used the add-node command and selected the private interface
- Ran the join command on the next node
- repeat steps 3-4 on all nodes
- Install CoreDNS (microk8s enable dns)
Enabling DNS Applying manifest serviceaccount/coredns created configmap/coredns created deployment.apps/coredns created service/kube-dns created clusterrole.rbac.authorization.k8s.io/coredns created clusterrolebinding.rbac.authorization.k8s.io/coredns created Restarting kubelet Adding argument --cluster-domain to nodes. Configuring node 182.166.77.151 Failed to reach node. HTTPSConnectionPool(host='182.166.77.151', port=25000): Max retries exceeded with url: /cluster/api/v1.0/configure (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f2eca145a58>: Failed to establish a new connection: [Errno 110] Connection timed out',))
At this point it fails trying to reach the first node. Running "microk8s kubectl get nodes -o wide" I see that all nodes are using default interface IP's as their "internal IP" and no "external IP" is set. The first node that the CoreDNS addon is attempting to reach should be reached over 10.99.0.2, but it is using 182.166.77.151, the default address for that VM.
So to clarify, the issue is that the "internal IP" of the node is using the default interface rather than the private network IP. Is there any way to specify the advertised IP of the node when installing (or post-install)? I see there is an option to use the loopback address but that seems to be it.
Update: to set internal IP's you can run the following (run as root on each node):
echo --node-ip=$(ip address show dev YOUR_PRIVATE_INTERFACE | grep 'inet ' | awk -F ' ' '{print $2}' | sed "s//24//g") >> /var/snap/microk8s/current/args/kubelet; microk8s stop; microk8s start
Replacing YOUR_PRIVATE_INTERFACE with the private interface name. Idk if this is correct or causes any different issues, but it at least sets the internal IP correctly.
@ktsakalozos Is there a valid way to do this? I attempted the following to solve my issue:
- Hardcode the IP in the Calico config (/var/snap/microk8s/current/args/cni-network/cni.yaml)
- Hardcode the IP in the API-server config by setting
--advertise-address
in the /var/snap/microk8s/current/args/kube-apiserver - Hardcode the IP in the Kubelet config by setting
--node-ip
in the /var/snap/microk8s/current/args/kubelet
It worked when I started the node, but when I tried to join these nodes together it went completely nowhere with the following error logs:
Jul 6 23:46:39 one microk8s.daemon-kubelite[109803]: E0706 23:46:39.564207 109803 available_controller.go:508] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.152.183.165:443/apis/metrics.k8s.io/v1beta1: Get "https://10.152.183.165:443/apis/metrics.k8s.io/v1beta1": dial tcp 10.152.183.165:443: connect: no route to host
Jul 6 23:46:39 one microk8s.daemon-kubelite[109803]: E0706 23:46:39.633782 109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul 6 23:46:39 one microk8s.daemon-kubelite[109803]: E0706 23:46:39.734625 109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul 6 23:46:39 one microk8s.daemon-kubelite[109803]: E0706 23:46:39.834942 109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul 6 23:46:39 one microk8s.daemon-kubelite[109803]: E0706 23:46:39.936713 109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul 6 23:46:40 one microk8s.daemon-kubelite[109803]: E0706 23:46:40.037661 109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul 6 23:46:40 one microk8s.daemon-kubelite[109803]: E0706 23:46:40.138446 109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul 6 23:46:40 one microk8s.daemon-kubelite[109803]: E0706 23:46:40.239243 109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul 6 23:46:40 one microk8s.daemon-kubelite[109803]: E0706 23:46:40.339734 109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul 6 23:46:40 one microk8s.daemon-kubelite[109803]: E0706 23:46:40.440353 109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul 6 23:46:40 one microk8s.daemon-kubelite[109803]: E0706 23:46:40.541775 109803 kubelet.go:2291] "Error getting node" err="node \"one.us.vpc\" not found"
Jul 6 23:46:40 one microk8s.daemon-kubelite[109803]: E0706 23:46:40.542045 109803 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://127.0.0.1:16443/api/v1/services?limit=500&resourceVersion=0": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "10.152.183.1")
@afeiszli check my answer below. It works for one node, but when you try to join these nodes together - it breakes the whole cluster.
Jul 6 23:59:59 one microk8s.daemon-kubelite[134568]: E0706 23:59:59.270000 134568 kubelet_node_status.go:586] "Failed to set some node status fields" err="failed to validate nodeIP: node IP: \"10.11.11.5\" not found in the host's network interfaces" node="one.us.vpc"
Jul 7 00:00:00 one microk8s.daemon-kubelite[134568]: E0707 00:00:00.876002 134568 available_controller.go:508] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.152.183.165:443/apis/metrics.k8s.io/v1beta1: Get "https://10.152.183.165:443/apis/metrics.k8s.io/v1beta1": dial tcp 10.152.183.165:443: connect: no route to host
It actually worked on the second try, but still put some weird logs...
Realized that for some reason after joining cluster --node-ip
in the Kubelet file of joining node changes to the IP of the node where it joins.
For example:
node 1 (ip 1): microk8s add-node
node 2 (ip 2): microk8s join ...
node 2 get the ip1 for in kubelet config
Okay, I were able to join all the nodes and then manually override IP addresses for them. However when trying to run some pods I'm getting the following error:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b9b97afc5e9b38f8dd320c90e1fe96bb6268ef46df376ff694730718deea03b2": error getting ClusterInformation: Get https://[10.152.183.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.152.183.1:443: connect: no route to host
I flushed IP Tables as suggested in other issues. Still getting the same error, but with timeouts instead of no route
from time to time. The weirdest thing around all that is that pods are starting occasionally, so it's kind of a bug that sometimes is here and sometimes isn't. But 99% of time it is there.
[from the Netmaker side]: Have you checked with a simple ping to make sure nodes are still able to talk to each other over the private addresses (you can also run "wg show" and check the last handshake)? If they are not, it may be necessary to restart the network interface after an iptables flush depending on the rules that were put in place, which can be done with "netclient pull -n
Can you make sure (probably by editing the /etc/hosts on each node) that the hostname of each node resolves to the IP on the VPN?
This fixed the ClusterIP issue for me. For posterity, a quick hack to run on each node, that will take the all node hostnames and advertised addresses, and insert them into /etc/hosts:
echo "$(microk8s kubectl get nodes -o wide | awk 'NR>1{ print $6, $1 }')" >> /etc/hosts
@afeiszli Did this solved this timeout issue we discussed today? Because it didn't help me much, to be honest.
@ArseniiPetrovich, @ktsakalozos mentioned that hostnames may need to be reachable before running the cluster install. I've reinstalled by first setting up hostnames. I still recieved a timeout, but it appears to be better at least for now.
I believe I've found a key piece of this. If you do a "kubectl describe svc kubernetes -n default", you will see the endpoints are still pointing to the public IP's. To fix this you must stop each node (microk8s stop), and edit "/var/snap/microk8s/current/args/kube-apiserver", adding in the following line: --advertise-address=
When the node comes back up, the endpoint should be correct. This fixed one of my remaining timeout errors. Will leave cluster up for a while to see if other errors are encountered but network seems to be behaving now.
@afeiszli When I did this last time I received an error about certificates, when new nodes were not able to join the Microk8s server. Can you try under this setup adding new node?
@afeiszli @ktsakalozos Small friendly ping here :)
Same problem here, incredible that so many kubernetes distributions have the same problem. I only could get it to work with K3S. It is so simples to provide some --node-ip flag or even respect the interface used on the first node to join.
Just for the issue record, my use case is: I have VPSs in differente cloud providers and bare metals with residential IPs. So I don't have an private network in common. To solve this, I'm using Wireguard to wrap all machines in same private network (interface wg0).
What I expect: every nodes joining through IP provided by wg0 interface having internal IP described in wg0. What I got: the main Ubuntu interface's IP
Any way to make it happen?
I just need every nodes talk each other through a choosed interface or event that one used on join command.
Thanks to everyone providing bits of advice in this thread.
I'm also using microk8s to run a cluster where every node is running in a different VPS from a different provider. The VPSs are on a tinc network together, forming subnet 10.x.y.z/24
. So the interface I want to work on (tun1
) is not the default interface.
This is what worked for me:
On every node (including the master(s)):
-
microk8s stop
(Stop all nodes before changing configuration files) - Get the VPN IP of the node, e.g.
10.x.y.z
. Commandip a show dev tun1
will show info for interfacetun1
. - Add this to the bottom of
/var/snap/microk8s/current/args/kubelet
:
--node-ip=10.x.y.z
- Add this to the bottom of
/var/snap/microk8s/current/args/kube-apiserver
:
--advertise-address=10.x.y.z
-
microk8s start
Now I see the correct values in the INTERNAL-IP
column with microk8s kubectl get nodes -o wide
.
Realized that for some reason after joining cluster
--node-ip
in the Kubelet file of joining node changes to the IP of the node where it joins. For example: node 1 (ip 1): microk8s add-node node 2 (ip 2): microk8s join ... node 2 get the ip1 for in kubelet config
I'm also seeing this. I have to change /var/snap/microk8s/current/args/kubelet
again after joining to make the changes stick. Obviously a bug!
Hello all,
I have the same issue here. My workers have two interfaces and microk8s.join gets the "secondary" one. I try several comments but not one works. Apparently, kubelet ignores the --node-ip flag, and the BGP IP detection method in cni-network/cni.yaml file.
In my case, the ubuntu1 is a simple worker.
~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ubuntu1 Ready <none> 15m v1.23.1-2+b7120affd6631a 10.162.X.Y <none> Ubuntu 20.04.3 LTS 5.4.0-1050-raspi containerd://1.5.7
dellemc Ready <none> 3h24m v1.23.1-2+9e8d1bf4219080 172.30.248.86 <none> Ubuntu 20.04.3 LTS 5.4.0-89-generic containerd://1.5.7
and it has the following, routes:
# sudo ip route list
default via 169.254.66.80 dev eth0 proto static
default via 10.162.16.1 dev wlan0 proto dhcp src 10.162.X.Y metric 600
...
I follow this thread and the related ones but I could not find any solution. I appreciate some kind of help with respect to this configuration. Best
Oly k3s solves that problem. incredible how this simple use case is a huge problem across many frameworks.
Hi @wisaaco
This is most likely related to having two default interfaces on your machines. Can you elaborate on the steps that were taken when creating the cluster? (That is, the microk8s add-node
and microk8s join
commands that you used to form the cluster.).
Please also include the contents of /var/snap/microk8s/current/var/kubernetes/backend/cluster.yaml
). This will help with tracking down why this happens. Thanks!
Hi @neoaggelos ,
In one of the attempts, I tried to remove one default interface in a fresh installation but the snap.microk8s.daemon-kubelite service did not start.
In any case, I follow these steps:
- In the master:
$ microk8s add-node
From the node you wish to join to this cluster, run the following:
microk8s join 172.30.248.86:25000/587970307c112959444429819caa0abc/e31414f43733
Use the '--worker' flag to join a node as a worker not running the control plane, eg:
microk8s join 172.30.248.86:25000/587970307c112959444429819caa0abc/e31414f43733 --worker
If the node you are adding is not reachable through the default interface you can use one of the following:
microk8s join 172.30.248.86:25000/587970307c112959444429819caa0abc/e31414f43733
microk8s join 172.17.0.1:25000/587970307c112959444429819caa0abc/e31414f43733
microk8s join 192.168.49.1:25000/587970307c112959444429819caa0abc/e31414f43733
microk8s join 172.18.0.1:25000/587970307c112959444429819caa0abc/e31414f43733
microk8s join fc00:f853:ccd:e793::1:25000/587970307c112959444429819caa0abc/e31414f43733
My master also has to interfaces:
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 2c:ea:7f:e0:17:70 brd ff:ff:ff:ff:ff:ff
inet 172.30.248.86/22 brd 172.30.251.255 scope global dynamic eno1
3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 2c:ea:7f:e0:17:71 brd ff:ff:ff:ff:ff:ff
inet 169.254.66.80/16 brd 169.254.255.255 scope global eno2
- And from the output, I chose the next command to run in the worker:
microk8s join 172.30.248.86:25000/aa48a0e60504b58ab9a0d6191206f1ae/e31414f43733 --worker
If I choose thismicrok8s join 192.168.49. ...
, the worker connects to the master but it is not visible inside the master.
Note all devices ping each other. In the master, eno1 provides internet access and eno2 is local. The workers are connected via eno2 to the master.
The content of /var/snap/microk8s/current/var/kubernetes/backend/cluster.yaml is:
$ cat /var/snap/microk8s/current/var/kubernetes/backend/cluster.yaml
- Address: 172.30.248.86:19001
ID: 3297041220608546238
Role: 0
Thanks!
I'm having issues after deleting a second node from a HA cluster without "leaving" from cluster. It occurs because I'm using the Kubernetes autoscaler.
Behavior: After second node be removed, the master stuck forever.
Notes:
- I have two interfaces, one public and another internal (vpn)
- I tried remove the second node reference from /var/snap/microk8s/current/var/kubernetes/backend/*.yaml files manually
- I tried restart the master node
- My internal network (vpc) is working.
I had any success.
Thanks an advance by the help.
I found a way to monkey patch the network routes by basically changing the default interface. However this should not be considered as a solution since it BREAKS SSH.
Supposing this is the output of route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.255.255.1 0.0.0.0 UG 0 0 0 ens192
10.1.68.128 10.1.68.128 255.255.255.192 UG 0 0 0 vxlan.calico
10.152.183.0 192.168.1.1 255.255.255.0 UG 0 0 0 ens224
10.255.255.1 0.0.0.0 255.255.255.255 UH 0 0 0 ens192
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 ens224
# Remove the default WAN interface (ens192 in my case)
sudo route del -net 0.0.0.0 gw 10.255.255.1 netmask 0.0.0.0 dev ens192 => WILL BREAK SSH
# Change the default interface to redirect default traffic to LAN interface (ens224 in my case)
sudo ip route add default via 192.168.1.1 dev ens224
If you did this on a server and want to get SSH back, you have to run this from a KVM console. (I was there Gandalf)
# Set rule to redirect default traffic to ens192 WAN interface (Will restore SSH)
sudo ip route add default via 10.255.255.1 dev ens192
Realized that for some reason after joining cluster
--node-ip
in the Kubelet file of joining node changes to the IP of the node where it joins. For example: node 1 (ip 1): microk8s add-node node 2 (ip 2): microk8s join ... node 2 get the ip1 for in kubelet configI'm also seeing this. I have to change
/var/snap/microk8s/current/args/kubelet
again after joining to make the changes stick. Obviously a bug!
I tried experimenting with the --address
option in /var/snap/microk8s/current/args/kubelet
set to the address of the interface I want to use (192.168.1.0 for example) but it didn't work out. I also tried out --bind-address
instead of --advertise-address
but no luck there too.
The reason I'm trying that out is that if you have dns enabled and you want to join a node without changing the node ip first, it always timeouts...