tectonic-installer icon indicating copy to clipboard operation
tectonic-installer copied to clipboard

Kube-flannel may configure Flannel with a stale interface IP

Open aknuds1 opened this issue 6 years ago • 2 comments

Versions

  • Tectonic version (release or commit hash): bb1007decae57b4d933734a13d550587fa2d9c45
  • Terraform version (terraform version): 0.10.8
  • Platform (aws|azure|openstack|metal): digitalocean

What happened?

Flannel pods fail on account of not finding their corresponding network interfaces. It seems to me this happens because kube-flannel configures Flannel with an interface IP that can be stale.

From debugging I found that failures were caused by Flannel being configured with an interface IP corresponding to nodes from previous clusters of mine, so I'm gonna guess that kube-flannel's $(POD_IP) variable is determined from DNS lookups against nodes. If the DNS lookups return cached/stale values, Flannel will be configured wrongly as seems to happen in my case.

What I'm wondering is why it's necessary to configure Flannel's --iface option explicitly, instead of letting it determine it automatically? I don't know how Flannel's automatic interface detection works, but hopefully it's less fragile than the current solution, which seems to break when DNS lookups return stale IPs.

aknuds1 avatar Nov 21 '17 13:11 aknuds1

Hey @aknuds1 making it explicit ensures consistency and robustness across different platforms and environments, the POD_IP value comes directly from the ip assigned to the pod at creation time by kubernetes https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/ so I can't see how it can be stale

enxebre avatar Jan 02 '18 15:01 enxebre

@enxebre Consider what I said initially: "I'm gonna guess that kube-flannel's $(POD_IP) variable is determined from DNS lookups against nodes. If the DNS lookups return cached/stale values, Flannel will be configured wrongly as seems to happen in my case".

So, it doesn't seem to work as well in practice as you seem to think, unfortunately. I have seen myself that $POD_IP is stale, so there's no question about that. I'm not sure how it happens, but like I said my theory is that it's because of DNS lookups returning stale values.

There is also the question if the method of using Flannel's --iface flag works better than Flannel's own automatic detection, that is my question here. I see that the current method breaks, so we should investigate if letting Flannel detect automatically works better.

aknuds1 avatar Jan 03 '18 11:01 aknuds1