network-controller-harvester
network-controller-harvester copied to clipboard
fix: http(s) connection time out from VMs in VLAN netwok
Problem: The VM attached to the VLAN network fails to http(s) with management URL. However, there's no problem with SSH/ping which connect to ports other then http(s) since http(s) needs extra route to the CNI interface.
Solution: The http(s) egress from the uplink bridge interface should be untagged to be correctly routed. Since the routing is determined is based on L3, but the VLAN packet is L2.
Related Issue: https://github.com/harvester/harvester/issues/4359
As discussed, I suggest we completely disable the
/net/bridge/bridge-nf-call-iptables
kernel tunable during the network controller initialization because
I still have concern about simply disable bridge-nf-call-iptables
solution. I hit another problem today when creating a worker node
on different vlan w/ the management node. It fails update the secret because of timeout on http(s) the management url. However, ping/ssh from the worker node to the management node's IP is OK.
Then I create a worker node in the same VLAN w/ management, there's no such problem. Since all traffic from this node to management url will be SNAT/DNAT from 172.24.1.56 to 172.24.1.52
to 10.2.4.0 to 10.2.0.45
. The http(s) will take flannel
as the next hop instead of default gateway.
Maybe we need to list more use cases to find out the real solution.
I feel it is a different issue since no bridge CNI is involved; it's just pure canal stuff. But I strongly agree with you that we need to test thoroughly if we want to turn off the kernel tunable. Thanks!