k3s icon indicating copy to clipboard operation
k3s copied to clipboard

Offline install or run with no gateway set - k3s service won't run

Open danielbarron42 opened this issue 4 years ago • 6 comments

Version: k3s version v0.9.1 (755bd1c6) /usr/local/bin/k3s server --write-kubeconfig-mode 664 --no-deploy traefik --docker --cluster-cidr 10.244.0.0/16

Describe the bug I am using k3s in air gap/offline environments. I can install and run successfully without any internet access, but only if a gateway address is set. To be truly offline/air gap, I would like to be able to run and install without a gateway set. If I don't, for example during install I get:

systemctl status k3s.service -l
● k3s.service - Lightweight Kubernetes
   Loaded: loaded (/etc/systemd/system/k3s.service; disabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Tue 2019-11-19 06:32:10 PST; 1min 56s ago
     Docs: https://k3s.io
  Process: 5175 ExecStart=/usr/local/bin/k3s server --write-kubeconfig-mode 664 --no-deploy traefik --docker --cluster-cidr 10.244.0.0/16 (code=exited, status=1/FAILURE)
  Process: 5173 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
  Process: 5170 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
 Main PID: 5175 (code=exited, status=1/FAILURE)

Nov 19 06:32:10 myhostname k3s[5175]: -v, --v Level                          number for the log level verbosity
Nov 19 06:32:10 myhostname k3s[5175]: --version version[=true]           Print version information and quit
Nov 19 06:32:10 myhostname k3s[5175]: --vmodule moduleSpec               comma-separated list of pattern=N settings for file-filtered logging
Nov 19 06:32:10 myhostname k3s[5175]: time="2019-11-19T06:32:10.544522701-08:00" level=fatal msg="apiserver exited: unable to find suitable network address.error='no default routes found in \"/proc/net/route\" or \"/proc/net/ipv6_route\"'. Try to set the AdvertiseAddress directly or provide a valid BindAddress to fix this"
Nov 19 06:32:10 myhostname systemd[1]: k3s.service holdoff time over, scheduling restart.
Nov 19 06:32:10 myhostname systemd[1]: Stopped Lightweight Kubernetes.
Nov 19 06:32:10 myhostname systemd[1]: start request repeated too quickly for k3s.service
Nov 19 06:32:10 myhostname systemd[1]: Failed to start Lightweight Kubernetes.
Nov 19 06:32:10 myhostname systemd[1]: Unit k3s.service entered failed state.
Nov 19 06:32:10 myhostname systemd[1]: k3s.service failed.

There are circumstances where the gateway may become unset after installation as well which takes out some of the pods and causes remote (via local LAN in the same subnet via a service in the affected pods) access to be unavailable.

I believe this is caused by the code which tries to determine what IP to use for the node.

I have tried specifying --advertise-address as the IP of the node as well as trying 127.0.0.1. I tried the same with --bind-address.

To Reproduce Attempt air gap install with no network gateway set.

Expected behavior Install to succeed and all normal functionality to work. And post install, if the gateway is removed, all normal functionality to continue.

Actual behavior Install fails. Service errors as listed above.

danielbarron42 avatar Nov 19 '19 14:11 danielbarron42

Sorry, it used to be documented that a default route is needed for air-gap. The docs were something like this:


If networking is completely disabled k3s may not be able to start (ie ethernet unplugged or wifi disconnected), in which case it may be necessary to add a default route. For example:

sudo ip -c address add 192.168.123.123/24 dev eno1	
sudo ip route add default via 192.168.123.1	

We should investigate what flags are needed to work without a default route, add better docs, and maybe check for a default route in check-config.

erikwilson avatar Nov 19 '19 15:11 erikwilson

Thank you for looking at this.

I have already scripted checking for a default route when installing. For example:

if [ -z "$(ip route | grep default)" ]; then
    echo "default route missing"
    exit 1
fi

Docs are nice. What would help the most is being able to install without a default route.

I have looked at the code and could not see anything 'easy'. The main problem is "how to determine the IP of the node" - especially on a host with more than one NIC. The simple answer is "it's the one with the default route" - so I can see why it's currently like that.

danielbarron42 avatar Nov 19 '19 17:11 danielbarron42

I have found a workaround which allows a true air-gap installation:

#/etc/sysconfig/network-scripts/ifcfg-tap0
DEVICE=tap0
ONBOOT=yes
BOOTPROTO=none
TYPE=Tap
DEFROUTE=no
IPADDR=10.243.255.254
PREFIX=32

ifup tap0

Add --flannel-iface tap0 to ExecStart=/usr/local/bin/k3s server in /etc/systemd/system/k3s.service

Alternatively just add --flannel-iface <your eth interface>

I traced the problem to flannel wanting to know what IP to bind to, to do that it looks for which interface has the default route and obtains its IP address. By specifying the interface it does not need a default route. By specifying a tap interface you don't even need an ethernet interface up with an IP.

It would be better if it did some of this itself and didn't rely on there being a default route. So I don't think this is only a documentation defect/feature request.

danielbarron42 avatar Jan 08 '20 07:01 danielbarron42

interested in this too. my use case is that i keep my RPi k3s cluster off any network (in a camper) periodically and then bring it back home and put it back on my home network. k3s works great when connected to my home network, but when it has no connection to a network (and therefore no default gateway), the k3s service fails to start.

branttaylor avatar Dec 17 '21 04:12 branttaylor

This would really be a nice benefit if k3s could more reliably start in different network conditions. Testing @danielbarron42's suggestion - but this becomes a real foot-gun with k3s currently!

One note that I haven't 100% confirmed yet - starting the cluster with --cluster-cidr= seems to side-step this issue.

erulabs avatar Sep 02 '22 18:09 erulabs

I have k3s working reliably air-gapped and without a gateway for years using my above workaround. Recently I found a couple of other things I do are required as well. This is what I do:

--flannel-iface tap0 (see how to make the tap0 interface in my workaround above) --kube-proxy-arg "proxy-mode=ipvs" (needed because of how it creates routes to pods and services bypassing the need of a gateway)

I make sure the coredns configmap doesn't have a forward if there's no DNS configured, which I would expect there not to be when air-gapped. Or coredns won't start.

I also set: --cluster-cidr 10.242.0.0/16 --service-cidr 10.243.0.0/16 --cluster-dns 10.243.0.10 But I've not tested if that's required or not for air-gapped. This is only to avoid overlap with some local networks.

danielbarron42 avatar Sep 03 '22 08:09 danielbarron42