k0sctl Wrong private network interface fails validation during k0s upgrade

My nodes only have 1 private network interface (loopback excluded), therefore I performed the installation automatically. Now I wanted to upgrade k0s to a newer version and got a validation error.

INFO ==> Running phase: Download k0s on hosts
INFO [OpenSSH] waahhhh-earth: downloading k0s v1.29.1+k0s.0
INFO [OpenSSH] waahhhh-mercury: downloading k0s v1.29.1+k0s.0
INFO [OpenSSH] waahhhh-earth: validating configuration
INFO ==> Apply failed

Therefore I validated the schema with the current and new k0s version. Everything is valid. The log contains the following message:

time="04 Feb 24 23:49 CET" level=debug msg="[OpenSSH] waahhhh-earth: (stderr) Error: spec: api: address: Invalid value: \"waahhhh-earth\": invalid IP address"

The configuration section looks as follows and has not changed.

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: universe
spec:
  hosts:
    - role: controller+worker
      openSSH:
        address: waahhhh-earth
    - role: worker
      openSSH:
        address: waahhhh-mercury
  k0s:
    version: v1.29.1+k0s.1
    dynamicConfig: false
    config:
      apiVersion: k0s.k0sproject.io/v1beta1
      kind: ClusterConfig
      metadata:
        name: k0s
      spec:
        api:
          externalAddress: XX.XX.XX.XX # valid IPv4
          k0sApiPort: 9443
          port: 6443
        installConfig: ...

At first I thought that the problem is related to the openSSH definition, because the address (waahhhh-earth) is not specified in the hostname or in the rest of the configuration.

Only after reading the output several times I realized that the wrong network interface was used.

INFO [OpenSSH] waahhhh-earth: discovered tun as private interface
INFO [OpenSSH] waahhhh-mercury: discovered tun as private interface

The interface (tun) was set up by the Kubernetes cluster and k0sctl now considers it to be the right network. The problem could be solved with the following adjustment:


hosts:
  - role: controller+worker
    privateInterface: eth0
    openSSH:
      address: waahhhh-earth
  - role: worker
    privateInterface: eth0
    openSSH:
      address: waahhhh-mercury

If you read the entire output, it's very confusing at first. Maybe we could make some improvements here so that other users don't face the same problem.

This is what the complete output looks like, in which the actual problem is hiding as a valid info message.

$ k0sctl apply --config resources/k0s/k0sctl.yaml

⠀⣿⣿⡇⠀⠀⢀⣴⣾⣿⠟⠁⢸⣿⣿⣿⣿⣿⣿⣿⡿⠛⠁⠀⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀█████████ █████████ ███
⠀⣿⣿⡇⣠⣶⣿⡿⠋⠀⠀⠀⢸⣿⡇⠀⠀⠀⣠⠀⠀⢀⣠⡆⢸⣿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀███          ███    ███
⠀⣿⣿⣿⣿⣟⠋⠀⠀⠀⠀⠀⢸⣿⡇⠀⢰⣾⣿⠀⠀⣿⣿⡇⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀███          ███    ███
⠀⣿⣿⡏⠻⣿⣷⣤⡀⠀⠀⠀⠸⠛⠁⠀⠸⠋⠁⠀⠀⣿⣿⡇⠈⠉⠉⠉⠉⠉⠉⠉⠉⢹⣿⣿⠀███          ███    ███
⠀⣿⣿⡇⠀⠀⠙⢿⣿⣦⣀⠀⠀⠀⣠⣶⣶⣶⣶⣶⣶⣿⣿⡇⢰⣶⣶⣶⣶⣶⣶⣶⣶⣾⣿⣿⠀█████████    ███    ██████████
k0sctl v0.17.4 Copyright 2023, k0sctl authors.
Anonymized telemetry of usage will be sent to the authors.
By continuing to use k0sctl you agree to these terms:
https://k0sproject.io/licenses/eula
INFO ==> Running phase: Connect to hosts
INFO [OpenSSH] waahhhh-earth: connected
INFO [OpenSSH] waahhhh-mercury: connected
INFO ==> Running phase: Detect host operating systems
INFO [OpenSSH] waahhhh-mercury: is running Ubuntu 22.04.3 LTS
INFO [OpenSSH] waahhhh-earth: is running Ubuntu 22.04.3 LTS
INFO ==> Running phase: Acquire exclusive host lock
INFO ==> Running phase: Prepare hosts
INFO ==> Running phase: Gather host facts
INFO [OpenSSH] waahhhh-earth: using earth as hostname
INFO [OpenSSH] waahhhh-mercury: using mercury as hostname
INFO [OpenSSH] waahhhh-earth: discovered tun as private interface
INFO [OpenSSH] waahhhh-mercury: discovered tun as private interface
INFO ==> Running phase: Validate hosts
INFO ==> Running phase: Gather k0s facts
INFO [OpenSSH] waahhhh-earth: found existing configuration
INFO [OpenSSH] waahhhh-earth: is running k0s controller+worker version v1.28.4+k0s.0
WARN [OpenSSH] waahhhh-earth: the controller+worker node will not schedule regular workloads without toleration for node-role.kubernetes.io/master:NoSchedule unless 'noTaints: true' is set
WARN [OpenSSH] waahhhh-earth: k0s will be upgraded
INFO [OpenSSH] waahhhh-mercury: is running k0s worker version v1.28.4+k0s.0
WARN [OpenSSH] waahhhh-mercury: k0s will be upgraded
INFO [OpenSSH] waahhhh-earth: checking if worker mercury has joined
INFO ==> Running phase: Validate facts
INFO ==> Running phase: Download k0s on hosts
INFO [OpenSSH] waahhhh-earth: downloading k0s v1.29.1+k0s.0
INFO [OpenSSH] waahhhh-mercury: downloading k0s v1.29.1+k0s.0
INFO [OpenSSH] waahhhh-earth: validating configuration
INFO ==> Apply failed

The current network interfaces are:

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    altname ens3
    inet XX.XX.XX.XX/22 brd XX.XX.XX.XX scope global eth0
       valid_lft forever preferred_lft forever
    inet6 XX:XX:XX:XX::1/64 scope global
       valid_lft forever preferred_lft forever
    inet6 XX::XX:XX:XX:XX/64 scope link
       valid_lft forever preferred_lft forever
3: kube-bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff
    inet 10.244.0.1/24 brd 10.244.0.255 scope global kube-bridge
       valid_lft forever preferred_lft forever
    inet6 XX::XX:XX:XX:XX/64 scope link
       valid_lft forever preferred_lft forever
5: veth4c8aee89@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master kube-bridge state UP group default
    link/ether XX:XX:XX:XX:XX:XX brd ff:ff:ff:ff:ff:ff link-netns cni-XX-XX-XX-XX-XX
    inet6 XX::XX:XX:XX:XX/64 scope link
       valid_lft forever preferred_lft forever
...
9: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
10: tun-852356528@eth0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip XX.XX.XX.XX peer XX.XX.XX.XX
    inet6 XX::XX:XX:XX:XX/64 scope link
       valid_lft forever preferred_lft forever

Feb 05 '24 19:02 waahhhh

(ip route list scope global | grep -E "\b(172|10|192\.168)\.") || (ip route list | grep -m1 default)

This is the command k0sctl uses to detect the private interface. What does the output of that look like?

Feb 06 '24 07:02 kke

$ echo $((ip route list scope global | grep -E "\b(172|10|192\.168)\.") || (ip route list | grep -m1 default))
10.244.1.0/24 dev tun-852356528 proto 17 src 89.XX.XX.XX

$ echo $(ip route list scope global | grep -E "\b(172|10|192\.168)\.")
10.244.1.0/24 dev tun-852356528 proto 17 src 89.XX.XX.XX

$ echo $(ip route list | grep -m1 default)
default via 89.XX.XX.XX dev eth0 proto static

Feb 07 '24 23:02 waahhhh

So, I think you don't have a private interface and it picks up eth0 as the fallback but only on the first round 🤔

Feb 12 '24 12:02 kke

Facing the same issue, is there a workaround while it isn't fixed?

EDIT: Workaround for me was to use IP addresses instead of hostnames in the list of hosts. Not ideal to me, I'd rather use domains, but this works.

Mar 06 '24 10:03 d3adb5