network-controller-harvester icon indicating copy to clipboard operation
network-controller-harvester copied to clipboard

fix: http(s) connection time out from VMs in VLAN netwok

Open mingshuoqiu opened this issue 1 year ago • 2 comments

Problem: The VM attached to the VLAN network fails to http(s) with management URL. However, there's no problem with SSH/ping which connect to ports other then http(s) since http(s) needs extra route to the CNI interface.

Solution: The http(s) egress from the uplink bridge interface should be untagged to be correctly routed. Since the routing is determined is based on L3, but the VLAN packet is L2.

Related Issue: https://github.com/harvester/harvester/issues/4359

mingshuoqiu avatar Feb 23 '24 06:02 mingshuoqiu

As discussed, I suggest we completely disable the /net/bridge/bridge-nf-call-iptables kernel tunable during the network controller initialization because

I still have concern about simply disable bridge-nf-call-iptables solution. I hit another problem today when creating a worker node on different vlan w/ the management node. It fails update the secret because of timeout on http(s) the management url. However, ping/ssh from the worker node to the management node's IP is OK.

Then I create a worker node in the same VLAN w/ management, there's no such problem. Since all traffic from this node to management url will be SNAT/DNAT from 172.24.1.56 to 172.24.1.52 to 10.2.4.0 to 10.2.0.45. The http(s) will take flannel as the next hop instead of default gateway.

Maybe we need to list more use cases to find out the real solution.

mingshuoqiu avatar Mar 06 '24 07:03 mingshuoqiu

I feel it is a different issue since no bridge CNI is involved; it's just pure canal stuff. But I strongly agree with you that we need to test thoroughly if we want to turn off the kernel tunable. Thanks!

starbops avatar Mar 07 '24 03:03 starbops