calico icon indicating copy to clipboard operation
calico copied to clipboard

Calico in eBPF mode intermittently sends out packets with same source and destination IP address (a.k.a. Land Attack)

Open igcherkaev opened this issue 3 years ago • 0 comments

Expected Behavior

Source IP and port for all packets are expected to be rewritten according to the Service ClusterIP/port to which a client made a connection to.

Current Behavior

Intermittently our firewall reports dropping packets due to Land Attack and it was confirmed with tcpdump that such packets indeed leave worker nodes' interface:

14:29:49.674082 IP 10.141.18.105.38345 > 10.141.18.105.38345: Flags [.], seq 189038244:189039612, ack 634038585, win 11, options [nop,nop,TS val 743969600 ecr 2067125250], length 1368

Where 10.141.18.105 is the IP address of a client talking to a cluster IP service in Kubernetes.

We got this issue in mass numbers when we upgraded from Calico 3.21.4 to 3.22.3 in production. We got thousands of such packets leaving interfaces of many many nodes in the cluster and the clients were experiencing timeouts talking to services in Kubernetes. We have iBGP and announce the subnet for k8s services and route them via control plane nodes. We also have DSR mode enabled so that each worker just sends out traffic back according to the routing table.

Some background on this is still available in this slack thread: https://calicousers.slack.com/archives/CUKP5S64R/p1657229418815789

Where @tomastigera summarized it with:

For posterity it seems that this fix in 3.23 which fixes this feature introduced in 3.22 to help reuse CT entries if the client recycles connections very quicly.

And indeed the number of such packets went down from thousands per hour to 15 per day. However, it is still happening. It appears that sometimes NAT'ing is not happening or happening wrongfully and instead of placing correct IP/port into the source part of the packet, calico places destionation IP/port there.

Possible Solution

Unfortunately, I have no idea on any possible solutions here. This is rather to record the issue exists and track it if anyone would want to look into it and fix it.

Steps to Reproduce (for bugs)

Nothing special to do to reproduce it. It's happening randomly and very rarely under normal production load.

Context

N/A

Your Environment

  • Calico version: 3.23.2 installed with manifest
  • Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes 1.22.8
  • Operating System and version: Flatcar Container Linux by Kinvolk 3139.2.2 kernel 5.15.43-flatcar

Felix configuration:

  bpfEnabled: true
  bpfExternalServiceMode: DSR
  bpfKubeProxyIptablesCleanupEnabled: true
  bpfLogLevel: ""
  floatingIPs: Disabled
  logSeverityScreen: Info
  reportingInterval: 0s

igcherkaev avatar Jul 26 '22 00:07 igcherkaev