k0s ip autodection with calico

Is your feature request related to a problem? Please describe.

We had a weird issue where pods on a new k0s cluster were unable to talk to pods on another node/host. It turned out that the auto detection in Calico had somehow guessed the wrong interface.

So instead of using eth0 like on other clusters, it used eth1 which is a management network and not supposed to be used for node-to-node communication. This meant that the calico.vxlan interface lots all traffic.

We tried tcpdump etc. which wasn't very helpful. I already created an issue in Calico to find out if there's anything one can do to effectively debug/troubleshoot the tunnel since there doesn't seem to be anything obvious and the calicoctl tool is in a state of broken (e.g. regarding the use of docker-cli) and either demands to be executed on nodes directly or works with a $KUBECONFIG.

Describe the solution you would like

I see that currently it's empty by default: https://github.com/k0sproject/k0s/blob/1311fb0b73bb3d99202010f802e486aca5b813d4/pkg/apis/k0s/v1beta1/calico.go#L64

From the docs, it seems like, Calico will use the first interface found: https://docs.tigera.io/calico/latest/networking/ipam/ip-autodetection#autodetection-methods

Why on some clusters this is (the expected) eth0 and on others it is eth1 is currently unknown to me.

I would propose to use kubernetes-internal-ip or can-reach instead? Maybe some docs on how they can be used would be helpful as well.

Describe alternatives you've considered

Configuring this myself.

Additional context

Could be that a bump in Calico is needed for the kubernetes-internal-ip one, but I am not sure.

Nov 14 '23 15:11 till

This is not a k0s problem, it's standard Calico configuration. Please use:

          provider: calico
          calico:
            mode: "bird"
            envVars:
              IP_AUTODETECTION_METHOD: "interface=eth1"
              IP6_AUTODETECTION_METHOD: "interface=eth1"

in your k0s config. K0s should not try to guess or "help", since systems are different in unpredictable ways. As you can see I want eth1 to be used (in my case eth0 is the maintenance network).

As an example, some programs (e.g. early cri-o) checked the default route and selected that interface. I usually have multiple targets for my default route (ECMP), and that didn't work of course. So I had to temporary set a fake default route, and then reset it after the "clever" programs had made their magical "help". No thanks to that!

Jun 27 '24 05:06 uablrek

@uablrek Thanks, but I disagree.

From discussions about Calico on tickets, customization should be avoided when possible in order to make updates/upgrades easier and I know/shared (already) what a work around is (in config or at runtime).

Regardless, I am advocating for the use of kubernetes-internal-ip or can-reach as a new default as it would address this perfectly. And it would make Calico work for 99% of all use cases. Add to that, the node IPs/interfaces are already supplied via the k0s config, etc. and using kubernetes-internal-ip would avoid more duplicate info (for similar purposes) in the config file.

Jul 12 '24 16:07 till