cilium
cilium copied to clipboard
Failure to establish initial connection to clustermesh-apiserver NodePort when Tunneling + KPR + WireGuard + Host firewall are enabled
Is there an existing issue for this?
- [X] I have searched the existing issues
What happened?
When both tunneling and host firewall are enabled, pod to node traffic is encapsulated to avoid masquerading the source address, and preserve the source identity. This is achieved configuring the tunnel endpoint field of the respective ipcache entries to the node IP address itself.
Considering a clustermesh scenario, with the clustermesh-apiserver exposed through a NodePort (and KPR is enabled), this can cause agent to the remote clustermesh traffic to be routed asymmetrically, via the native device in one direction, and the tunnel in the other. Let's assume that we have two (single node for simplicity, although it doesn't matter) clusters: initially, the agent in the first cluster will attempt to connect to the remote clustermesh-apiserver NodePort: the corresponding entry is not present in the ipcache, hence traffic will be routed natively. The same applies for the return traffic, the connection can be successfully established, data pulled from the clustermesh-apiserver, and the remote node IPs are eventually inserted in the ipcache on the node of the first cluster. At this point, the agent on the second cluster tries to reach the clustermesh-apiserver on the first cluster, forward packets get still routed natively (as no ipcache entry is yet present there for the remote node), but return traffic is instead tunneled (obeying to the ipcache entry of the node in the first cluster), causing the asymmetry.
This is particularly problematic if WireGuard is enabled as well, because the encapsulated traffic is then encrypted by WireGuard, but discarded at the destination as the decryption key for that node has not yet been configured (as retrieved from the remote cluster clustermesh-apiserver). Please note that depending on timing this issue may or may not occur, but it typically occurs as long as the clusters are configured at slightly different points in time. Once the agents in one of the cluster configured the ipcache entries for the remote node, the agents (or pods, in case of kvstoremesh) in the other cluster will never succeed in connecting.
Cilium Version
Tested the main branch as of today, but I'm pretty confident it affects stable versions as well -- although one difference which may play a role is that we started encrypting all encapsulated packets (instead of skipping encapsulation altogether when WireGuard is enabled) in v1.15 (while that is opt-in in v1.14).
Code of Conduct
- [X] I agree to follow this project's Code of Conduct