calico icon indicating copy to clipboard operation
calico copied to clipboard

[v3.26] The can-reach parameter supports multiple target values

Open luanshuo opened this issue 1 year ago • 6 comments

Description

Related issues/PRs

Todos

Our cluster needs to take into account a variety of unstable situations, and the cluster cannot be connected to the external network, so we want to use the parameter CAN-reach, and it can support multiple target values, so I made some changes, so far the effect is good

  • [x] Tests In our cluster, All nodes have only master01-master03 resolution in /etc/hosts

image

In the yaml resource of calico-node I set IP_AUTODETECTION_METHOD to can-reach=master08,master09,master01 image

Looking at calico-node's log, it turns out that he correctly found the available IP

image

  • [x] Documentation You can set the value of can-reach to multiple target values similar to cidr - name: IP_AUTODETECTION_METHOD value: can-reach=master08,master09,master01

  • [ ] Release note The can-reach parameter supports multiple target values

Release Note

TBD

luanshuo avatar Dec 14 '23 11:12 luanshuo

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Dec 14 '23 11:12 CLAassistant

@luanshuo I'm not 100% sure this change is necessary, actually - the can-reach method of detection doesn't actually require network connectivity - it doesn't send packets, it just looks up the interface that would be used if a connection were going to be established.

I suppose this would be necessary if you had a can-reach target that was unreachable based on the host routing (i.e., no routes on the host cover it) - is that what your network looks like?

I have a slight worry about allowing multiple targets for this option. Namely, that if the destination used for the lookup changes over time, it can result in the node's auto-detected IP changing unnecessarily which can cause unwanted network instability. That said, it should only happen if not all of the provided can-reach targets resolve to the same interface, so I'm not necessarily against this, I just wonder if there's a better way to satisfy your needs than using the can-reach parameter.

caseydavenport avatar Dec 14 '23 17:12 caseydavenport

@caseydavenport I'm not sure if this feature is needed by everyone, but it is necessary in the scenarios we are facing.

  1. In our scenario, we provide the node deletion function, which will delete the parsing entries of /etc/hosts together with the deletion. If we delete the master01 node from the cluster after using can-reach=master01, can-reach=master01 cannot find the correct IP address. If there are multiple values, can-rach=master02,master03 can be avoided.

  2. we have a number of old clusters, some nodes in the cluster /etc/hosts do not have master01 parsing entries, we do not want to add a parsing action to all nodes, because these old clusters will also appear node add, delete action

So we want to be able to set multiple values so that we don't have to do too many code changes and node operations

luanshuo avatar Dec 15 '23 01:12 luanshuo

@luanshuo I guess I just wonder if can-reach is really the right mechanism for that environment then. There are a number of other options available: https://docs.tigera.io/calico-cloud/networking/ipam/ip-autodetection#autodetection-methods

For example, specifying an interface regex or using the Kubernetes InternalIP of the node might be more appropriate here, unless all of your nodes have different interface naming?

Alternatively, I think you should be able to use can-reach with an IP address instead of using the hostnames of the nodes, so that can-reach is no longer coupled to /etc/hosts lookups at all. The IP address doesn't even need to be a real address in the network, since like I said above can-reach doesn't send traffic, it just does a local routing lookup.

Like I said - not necessarily against this PR - but I'd rather find a way with existing methods if we can, since I think specifying multiple can-reach options has some corner-cases that could result in instability when node IPs / interfaces change.

caseydavenport avatar Dec 15 '23 17:12 caseydavenport

@caseydavenport We didn't just try can-reach, in fact we started with interface=eth0 to interface=^e.*, and then we used cidr=10.10.10.0/24, and other ways, and we tried all of that. However, because of the complexity of the scenario we are faced with, these options can only cover part of the story

This includes saying that can-reach specifies an IP, which we also tested, but as I said before our scenario is complex enough

  1. The same cluster may have different NIC names, eth0, ens33, bound0, or others
  2. In the same cluster, the egress IP addresses of nodes are on different network segments
  3. Multiple nics exist on the same node
  4. any node should take into account the IP switching scenario
  5. The cluster may not be connected to the external network
  6. In the same cluster, an entry may not coexist on /etc/hosts of different nodes
  7. ...

So we still hope that CAN-reach can support multiple values, which will solve a lot of our problems

luanshuo avatar Dec 15 '23 17:12 luanshuo

Not sure if you're still interested in this one @luanshuo - there are two more comments open.

caseydavenport avatar Mar 20 '24 17:03 caseydavenport

Closing due to inactivity, but happy to reopen if you pick this up again.

caseydavenport avatar Apr 23 '24 19:04 caseydavenport