talos icon indicating copy to clipboard operation
talos copied to clipboard

KubeSpan incorrectly intercepts public IP traffic causing Cilium networking failures

Open stevefan1999-personal opened this issue 6 months ago • 2 comments

Issue Summary

When Kubernetes nodes are configured with public IP addresses on their primary network interface (eth0) and KubeSpan is enabled, KubeSpan inappropriately intercepts and redirects traffic between nodes using their public IPs. This behavior conflicts with Cilium's expected network topology, resulting in asymmetric routing and broken network connectivity.

Environment Details

  • Talos Version: v1.10.4
  • Cilium Version: v1.17.5
  • Cluster Configuration:
    • Each nodes comes with its own public IP addresses (e.g., 123.123.123.x/24)
    • Direct internet connectivity between nodes
    • KubeSpan enabled
    • Cilium as CNI

Problem Description

In deployments where Kubernetes nodes have public IP addresses assigned to their primary network interface (eth0), enabling KubeSpan causes unexpected network behavior. Specifically:

  1. Unnecessary Traffic Interception: KubeSpan intercepts and redirects traffic between nodes using their public IP addresses, despite these nodes already having direct public connectivity. This redundancy adds unnecessary overhead and complexity.

  2. Cilium Compatibility Issues: Cilium CNI expects inter-node communication to occur over the primary network interface (eth0) when using public IPs. However, with KubeSpan enabled, the traffic flow is altered:

    • Outbound packets are redirected through KubeSpan
    • Return traffic attempts to use the expected eth0 path
    • This creates asymmetric routing, breaking the connection
  3. Broken Return Path: The fundamental issue is that the return route becomes broken due to the mismatch between:

    • How packets are sent (via KubeSpan)
    • How Cilium expects them to return (via eth0)

Steps to Reproduce

  1. Deploy a Talos cluster with nodes having public IP addresses on eth0
  2. Enable KubeSpan in the cluster configuration
  3. Install Cilium as the CNI
  4. Attempt pod-to-pod communication across nodes
  5. Observe connectivity failures and asymmetric routing issues

Expected Behavior

  • KubeSpan should detect when nodes have direct public connectivity and either:
    • Not intercept traffic between public IPs that can communicate directly
    • Provide a configuration option to exclude certain IP ranges from KubeSpan routing
  • Cilium should function normally with its expected network paths

Actual Behavior

  • KubeSpan intercepts all inter-node traffic, including public IP communication
  • Cilium experiences broken connectivity due to asymmetric routing
  • Pod-to-pod communication fails across nodes

Impact

This issue prevents the use of KubeSpan in environments where:

  • Nodes have public IP addresses
  • Cilium is the preferred CNI solution
  • Direct inter-node connectivity exists

This significantly limits deployment options for cloud environments where public IPs are assigned by default or required for other purposes.

Potential Solutions

  1. Add logic to KubeSpan to detect and skip interception for directly accessible public IPs
  2. Provide configuration options to exclude specific IP ranges from KubeSpan routing
  3. Implement proper symmetric routing handling when both KubeSpan and traditional routing are available
  4. Document the incompatibility and provide clear guidance for affected deployments

Additional Context

This issue is particularly problematic in VPS environments where:

  • Public IPs are automatically assigned to instances since they are much simpler
  • Cilium is chosen for its advanced networking features

In addition, I think we would need to enable KubeSpan conditionally, say like only if my node is behind NAT, then we will enable KubeSpan for that specific node, and otherwise, if all nodes have public access through their own node IP anyway, then we should disable KubeSpan altogether since forcefully enabled.

Anyway, the biggest problem with KubeSpan is that I just want to have a consistent mesh, and I don't really care about encryption since it adds up so much overhead plus the UDP transport could be limited by ISPs...

stevefan1999-personal avatar Jun 21 '25 13:06 stevefan1999-personal

Related #11235

stevefan1999-personal avatar Jun 21 '25 13:06 stevefan1999-personal

Minimal Reproduction with Real-World Impact

Reproduction Environment

For maintainers interested in reproducing this issue, here's a cost-effective setup using real cloud infrastructure:

Infrastructure Requirements:

  • 2-3 VPS instances from budget providers (RackNerd/Hetzner/DigitalOcean)
    • Each with public IPv4 address, not the kind of private IP address like VPC ranges
    • 2GB RAM minimum
    • Direct internet connectivity between instances
  • Total cost: ~$10-20/month for testing

Cilium Configuration (Minimal Reproduction)

kubeProxyReplacement: true  # Critical: Full eBPF datapath
bpf:
  masquerade: true          # Key setting that triggers the issue
cluster:
  name: shitgamelab
k8sServiceHost: localhost
k8sServicePort: 7445
operator:
  replicas: 3
securityContext:
  capabilities:
    ciliumAgent: [ CHOWN, KILL, NET_ADMIN, NET_RAW, IPC_LOCK, SYS_ADMIN, SYS_RESOURCE, DAC_OVERRIDE, FOWNER, SETGID, SETUID ]
    cleanCiliumState: [ NET_ADMIN, SYS_ADMIN, SYS_RESOURCE ]
cgroup:
  autoMount:
    enabled: false
  hostRoot: /sys/fs/cgroup
envoy:
  enabled: false            # Simplified setup without Envoy

Technical Root Cause Analysis

The conflict occurs at the kernel networking layer due to incompatible packet handling between KubeSpan and Cilium's eBPF implementation:

  1. Cilium's eBPF Masquerading (bpf.masquerade: true):

    • Bypasses traditional iptables/nftables completely
    • Directly manipulates socket buffers at the eBPF level
    • Expects packets to egress via the interface where the node IP is bound (eth0)
    • Creates BPF-based conntrack entries for connection tracking
  2. KubeSpan's nftables Rules:

    • Intercepts packets destined for other node IPs
    • Rewrites destination to redirect through WireGuard tunnel
    • Creates nftables-based NAT entries
  3. The Fatal Conflict:

    • Outbound path: KubeSpan's nftables rules catch packets first, redirecting them through WireGuard
    • Return path: Cilium's eBPF expects return traffic on eth0, but packets arrive via WireGuard
    • Missing conntrack: Cilium's BPF conntrack doesn't see the connection it expects
    • Result: Sockets remain in SYN_SENT state indefinitely, waiting for responses that never arrive correctly

Why This Configuration Matters

This isn't an edge case - it represents a common production scenario:

  1. Classical Deployments: Many organizations deploy Kubernetes on cloud VPS with public IPs by default
  2. Security Requirements: KubeSpan provides essential encryption for multi-datacenter deployments
  3. Performance Requirements: Cilium's eBPF datapath is chosen for its superior performance and advanced features
  4. Cost Considerations: Using existing public IPs avoids additional private networking costs

Business Impact

  • Service Outages: Complete loss of pod-to-pod communication across nodes
  • Debugging Complexity: The issue manifests as mysterious connection timeouts
  • Limited Workarounds: Forces users to choose between security (KubeSpan) and performance (Cilium eBPF)

Proposed Fix

The solution requires KubeSpan to be eBPF-aware:

  1. Detect when Cilium is using eBPF masquerading
  2. Either integrate with Cilium's BPF maps or skip interception for public IPs
  3. Provide clear documentation about this incompatibility

This issue effectively makes two of Talos's most powerful features mutually exclusive in common deployment scenarios.

stevefan1999-personal avatar Jun 21 '25 13:06 stevefan1999-personal

Thanks for the issue and information to reproduce.

Does the cilium node-to-node wireguard implementation work? I'm assuming the easiest thing to do would be to document that Kubespan shouldn't be used with Cilium and instead users should rely on cilium features.

rothgar avatar Jun 26 '25 00:06 rothgar

We can add documentation about incompatibility, but I want to make sure that it's clear that Cilium with default configuration works fine with KubeSpan. Cilium is a complex product, and it can be configured in many ways, some of them are not compatible with KubeSpan (and I guess incompatible with other networking software).

smira avatar Jun 27 '25 15:06 smira