ip autodection with calico
Is your feature request related to a problem? Please describe.
We had a weird issue where pods on a new k0s cluster were unable to talk to pods on another node/host. It turned out that the auto detection in Calico had somehow guessed the wrong interface.
So instead of using eth0 like on other clusters, it used eth1 which is a management network and not supposed to be used for node-to-node communication. This meant that the calico.vxlan interface lots all traffic.
We tried tcpdump etc. which wasn't very helpful. I already created an issue in Calico to find out if there's anything one can do to effectively debug/troubleshoot the tunnel since there doesn't seem to be anything obvious and the calicoctl tool is in a state of broken (e.g. regarding the use of docker-cli) and either demands to be executed on nodes directly or works with a $KUBECONFIG.
Describe the solution you would like
I see that currently it's empty by default: https://github.com/k0sproject/k0s/blob/1311fb0b73bb3d99202010f802e486aca5b813d4/pkg/apis/k0s/v1beta1/calico.go#L64
From the docs, it seems like, Calico will use the first interface found: https://docs.tigera.io/calico/latest/networking/ipam/ip-autodetection#autodetection-methods
Why on some clusters this is (the expected) eth0 and on others it is eth1 is currently unknown to me.
I would propose to use kubernetes-internal-ip or can-reach instead? Maybe some docs on how they can be used would be helpful as well.
Describe alternatives you've considered
Configuring this myself.
Additional context
Could be that a bump in Calico is needed for the kubernetes-internal-ip one, but I am not sure.
This is not a k0s problem, it's standard Calico configuration. Please use:
provider: calico
calico:
mode: "bird"
envVars:
IP_AUTODETECTION_METHOD: "interface=eth1"
IP6_AUTODETECTION_METHOD: "interface=eth1"
in your k0s config. K0s should not try to guess or "help", since systems are different in unpredictable ways. As you can see I want eth1 to be used (in my case eth0 is the maintenance network).
As an example, some programs (e.g. early cri-o) checked the default route and selected that interface. I usually have multiple targets for my default route (ECMP), and that didn't work of course. So I had to temporary set a fake default route, and then reset it after the "clever" programs had made their magical "help". No thanks to that!
@uablrek Thanks, but I disagree.
From discussions about Calico on tickets, customization should be avoided when possible in order to make updates/upgrades easier and I know/shared (already) what a work around is (in config or at runtime).
Regardless, I am advocating for the use of kubernetes-internal-ip or can-reach as a new default as it would address this perfectly. And it would make Calico work for 99% of all use cases. Add to that, the node IPs/interfaces are already supplied via the k0s config, etc. and using kubernetes-internal-ip would avoid more duplicate info (for similar purposes) in the config file.