Wrong private network interface fails validation during k0s upgrade
My nodes only have 1 private network interface (loopback excluded), therefore I performed the installation automatically. Now I wanted to upgrade k0s to a newer version and got a validation error.
INFO ==> Running phase: Download k0s on hosts
INFO [OpenSSH] waahhhh-earth: downloading k0s v1.29.1+k0s.0
INFO [OpenSSH] waahhhh-mercury: downloading k0s v1.29.1+k0s.0
INFO [OpenSSH] waahhhh-earth: validating configuration
INFO ==> Apply failed
Therefore I validated the schema with the current and new k0s version. Everything is valid. The log contains the following message:
time="04 Feb 24 23:49 CET" level=debug msg="[OpenSSH] waahhhh-earth: (stderr) Error: spec: api: address: Invalid value: \"waahhhh-earth\": invalid IP address"
The configuration section looks as follows and has not changed.
apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
name: universe
spec:
hosts:
- role: controller+worker
openSSH:
address: waahhhh-earth
- role: worker
openSSH:
address: waahhhh-mercury
k0s:
version: v1.29.1+k0s.1
dynamicConfig: false
config:
apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
name: k0s
spec:
api:
externalAddress: XX.XX.XX.XX # valid IPv4
k0sApiPort: 9443
port: 6443
installConfig: ...
At first I thought that the problem is related to the openSSH definition, because the address (waahhhh-earth) is not specified in the hostname or in the rest of the configuration.
Only after reading the output several times I realized that the wrong network interface was used.
INFO [OpenSSH] waahhhh-earth: discovered tun as private interface
INFO [OpenSSH] waahhhh-mercury: discovered tun as private interface
The interface (tun) was set up by the Kubernetes cluster and k0sctl now considers it to be the right network.
The problem could be solved with the following adjustment:
hosts:
- role: controller+worker
privateInterface: eth0
openSSH:
address: waahhhh-earth
- role: worker
privateInterface: eth0
openSSH:
address: waahhhh-mercury
If you read the entire output, it's very confusing at first. Maybe we could make some improvements here so that other users don't face the same problem.
This is what the complete output looks like, in which the actual problem is hiding as a valid info message.
$ k0sctl apply --config resources/k0s/k0sctl.yaml
⠀⣿⣿⡇⠀⠀⢀⣴⣾⣿⠟⠁⢸⣿⣿⣿⣿⣿⣿⣿⡿⠛⠁⠀⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀█████████ █████████ ███
⠀⣿⣿⡇⣠⣶⣿⡿⠋⠀⠀⠀⢸⣿⡇⠀⠀⠀⣠⠀⠀⢀⣠⡆⢸⣿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀███ ███ ███
⠀⣿⣿⣿⣿⣟⠋⠀⠀⠀⠀⠀⢸⣿⡇⠀⢰⣾⣿⠀⠀⣿⣿⡇⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀███ ███ ███
⠀⣿⣿⡏⠻⣿⣷⣤⡀⠀⠀⠀⠸⠛⠁⠀⠸⠋⠁⠀⠀⣿⣿⡇⠈⠉⠉⠉⠉⠉⠉⠉⠉⢹⣿⣿⠀███ ███ ███
⠀⣿⣿⡇⠀⠀⠙⢿⣿⣦⣀⠀⠀⠀⣠⣶⣶⣶⣶⣶⣶⣿⣿⡇⢰⣶⣶⣶⣶⣶⣶⣶⣶⣾⣿⣿⠀█████████ ███ ██████████
k0sctl v0.17.4 Copyright 2023, k0sctl authors.
Anonymized telemetry of usage will be sent to the authors.
By continuing to use k0sctl you agree to these terms:
https://k0sproject.io/licenses/eula
INFO ==> Running phase: Connect to hosts
INFO [OpenSSH] waahhhh-earth: connected
INFO [OpenSSH] waahhhh-mercury: connected
INFO ==> Running phase: Detect host operating systems
INFO [OpenSSH] waahhhh-mercury: is running Ubuntu 22.04.3 LTS
INFO [OpenSSH] waahhhh-earth: is running Ubuntu 22.04.3 LTS
INFO ==> Running phase: Acquire exclusive host lock
INFO ==> Running phase: Prepare hosts
INFO ==> Running phase: Gather host facts
INFO [OpenSSH] waahhhh-earth: using earth as hostname
INFO [OpenSSH] waahhhh-mercury: using mercury as hostname
INFO [OpenSSH] waahhhh-earth: discovered tun as private interface
INFO [OpenSSH] waahhhh-mercury: discovered tun as private interface
INFO ==> Running phase: Validate hosts
INFO ==> Running phase: Gather k0s facts
INFO [OpenSSH] waahhhh-earth: found existing configuration
INFO [OpenSSH] waahhhh-earth: is running k0s controller+worker version v1.28.4+k0s.0
WARN [OpenSSH] waahhhh-earth: the controller+worker node will not schedule regular workloads without toleration for node-role.kubernetes.io/master:NoSchedule unless 'noTaints: true' is set
WARN [OpenSSH] waahhhh-earth: k0s will be upgraded
INFO [OpenSSH] waahhhh-mercury: is running k0s worker version v1.28.4+k0s.0
WARN [OpenSSH] waahhhh-mercury: k0s will be upgraded
INFO [OpenSSH] waahhhh-earth: checking if worker mercury has joined
INFO ==> Running phase: Validate facts
INFO ==> Running phase: Download k0s on hosts
INFO [OpenSSH] waahhhh-earth: downloading k0s v1.29.1+k0s.0
INFO [OpenSSH] waahhhh-mercury: downloading k0s v1.29.1+k0s.0
INFO [OpenSSH] waahhhh-earth: validating configuration
INFO ==> Apply failed
The current network interfaces are:
$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff
altname enp0s3
altname ens3
inet XX.XX.XX.XX/22 brd XX.XX.XX.XX scope global eth0
valid_lft forever preferred_lft forever
inet6 XX:XX:XX:XX::1/64 scope global
valid_lft forever preferred_lft forever
inet6 XX::XX:XX:XX:XX/64 scope link
valid_lft forever preferred_lft forever
3: kube-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff
inet 10.244.0.1/24 brd 10.244.0.255 scope global kube-bridge
valid_lft forever preferred_lft forever
inet6 XX::XX:XX:XX:XX/64 scope link
valid_lft forever preferred_lft forever
5: veth4c8aee89@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master kube-bridge state UP group default
link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff link-netns cni-XX-XX-XX-XX-XX
inet6 XX::XX:XX:XX:XX/64 scope link
valid_lft forever preferred_lft forever
...
9: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
10: tun-852356528@eth0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip XX.XX.XX.XX peer XX.XX.XX.XX
inet6 XX::XX:XX:XX:XX/64 scope link
valid_lft forever preferred_lft forever
(ip route list scope global | grep -E "\b(172|10|192\.168)\.") || (ip route list | grep -m1 default)
This is the command k0sctl uses to detect the private interface. What does the output of that look like?
$ echo $((ip route list scope global | grep -E "\b(172|10|192\.168)\.") || (ip route list | grep -m1 default))
10.244.1.0/24 dev tun-852356528 proto 17 src 89.XX.XX.XX
$ echo $(ip route list scope global | grep -E "\b(172|10|192\.168)\.")
10.244.1.0/24 dev tun-852356528 proto 17 src 89.XX.XX.XX
$ echo $(ip route list | grep -m1 default)
default via 89.XX.XX.XX dev eth0 proto static
So, I think you don't have a private interface and it picks up eth0 as the fallback but only on the first round 🤔
Facing the same issue, is there a workaround while it isn't fixed?
EDIT: Workaround for me was to use IP addresses instead of hostnames in the list of hosts. Not ideal to me, I'd rather use domains, but this works.