Istio-proxy forwarding issue in K8s' nested containers setup
[ originally reported by @kylecarbs ]
In a K8s setup with a POD running a docker-in-docker (DinD) image, traffic generated within inner containers is blackholed in host's network namespace. No forwarding issue is observed when traffic is generated within the (privileged) POD itself.
Environment:
rodny@gke-cluster-1-default-pool-e78f2962-q6x8:~/bin$ uname -r
5.4.0-1024-gcp
rodny@gke-cluster-1-default-pool-e78f2962-q6x8:~/bin$ lsb_release -d
Description: Ubuntu 18.04.5 LTS
rodny@gke-cluster-1-default-pool-e78f2962-q6x8:~/bin$ kubectl version
Client Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.12-gke.1504", GitCommit:"17061f5bd4ee34f72c9281d49f94b4f3ac31ac25", GitTreeState:"clean", BuildDate:"2020-10-19T17:02:11Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.12-gke.1504", GitCommit:"17061f5bd4ee34f72c9281d49f94b4f3ac31ac25", GitTreeState:"clean", BuildDate:"2020-10-19T17:00:22Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"}
In this setup, Istio's sidecar-injection is automatically configured for the 'default' namespace, so istio-init is properly setting the iptables as expected:
root@sysbox-in-docker-6d8fc47bb6-xgz6v:/# iptables -L -t nat
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
LOG all -- anywhere anywhere LOG level debug prefix "rodny-nat-prerouting "
ISTIO_INBOUND tcp -- anywhere anywhere
DOCKER all -- anywhere anywhere ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT)
target prot opt source destination
LOG all -- anywhere anywhere LOG level debug prefix "rodny-nat-input "
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
LOG all -- anywhere anywhere LOG level debug prefix "rodny-nat-output "
ISTIO_OUTPUT tcp -- anywhere anywhere
DOCKER all -- anywhere !127.0.0.0/8 ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
LOG all -- anywhere anywhere LOG level debug prefix "rodny-nat-postrouting "
MASQUERADE all -- 172.24.0.0/16 anywhere
Chain DOCKER (2 references)
target prot opt source destination
RETURN all -- anywhere anywhere
Chain ISTIO_INBOUND (1 references)
target prot opt source destination
RETURN tcp -- anywhere anywhere tcp dpt:ssh
RETURN tcp -- anywhere anywhere tcp dpt:15020
ISTIO_IN_REDIRECT tcp -- anywhere anywhere
Chain ISTIO_IN_REDIRECT (2 references)
target prot opt source destination
REDIRECT tcp -- anywhere anywhere redir ports 15006
Chain ISTIO_OUTPUT (1 references)
target prot opt source destination
RETURN all -- 127.0.0.6 anywhere
ISTIO_IN_REDIRECT all -- anywhere !localhost
RETURN all -- anywhere anywhere owner UID match 1337
RETURN all -- anywhere anywhere owner GID match 1337
RETURN all -- anywhere localhost
ISTIO_REDIRECT all -- anywhere anywhere
Chain ISTIO_REDIRECT (1 references)
target prot opt source destination
REDIRECT tcp -- anywhere anywhere redir ports 15001
root@sysbox-in-docker-6d8fc47bb6-xgz6v:/#
Problem is specifically with TCP traffic initiated from inner containers, which is blackholed at host level. Problem seems to be a direct consequence of this forwarding sequence:
- TCP Sync packet generated from inner container is processed by PREROUTING chain and re-directed to Istio's inbound-handler (port 15006). Note that the source address here (172.24.0.2) corresponds to the egress iface of the inner container, and 172.24.0.1 is docker0's interface within this POD.
... nat-prerouting IN=docker0 OUT= PHYSIN=vethfd9bdf3 MAC=02:42:c7:d0:5d:05:02:42:ac:18:00:02:08:00 SRC=172.24.0.2 DST=74.125.20.101
- Packet exits Istio logic and is now processed by OUTPUT chain; however, this one is now sourced from 127.0.0.6:
... nat-output IN= OUT=eth0 SRC=127.0.0.6 DST=74.125.20.101 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52031 DF PROTO=TCP SPT=51503 DPT=80 WINDOW=42600 RES=0x00 SYN URGP=0
- Packet hits POSTROUTING chain:
... nat-postrouting IN= OUT=eth0 SRC=127.0.0.6 DST=74.125.20.101 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52031 DF PROTO=TCP SPT=51503 DPT=80 WINDOW=42600 RES=0x00 SYN URGP=0
- Packet ends up being discarded in host network namespace as '127.0.0.6' is not routable.
... kernel: IPv4: martian source 74.125.20.101 from 127.0.0.6, on dev cbr0
I have found a workaround that basically masquerades all traffic that hits POSTROUTING chain with source-address == '127.0.0.6', but i feel that this may not be a proper/generic-enough solution for this problem.
$ iptables -t nat -A POSTROUTING -s 127.0.0.6 ! -o docker0 -j MASQUERADE
$ iptables -L -t nat
...
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- * !docker0 172.24.0.0/16 0.0.0.0/0
0 0 MASQUERADE all -- * !docker0 127.0.0.6 0.0.0.0/0
...
Problem is clearly not a Sysbox issue as traffic is dropped regardless of the inner container being launched with Sysbox or the regular runc. However, we are tracking this one here as this is a common Sysbox deployment setup in K8s scenarios.