sysbox icon indicating copy to clipboard operation
sysbox copied to clipboard

Istio-proxy forwarding issue in K8s' nested containers setup

Open rodnymolina opened this issue 5 years ago • 0 comments

[ originally reported by @kylecarbs ]

In a K8s setup with a POD running a docker-in-docker (DinD) image, traffic generated within inner containers is blackholed in host's network namespace. No forwarding issue is observed when traffic is generated within the (privileged) POD itself.

Environment:

rodny@gke-cluster-1-default-pool-e78f2962-q6x8:~/bin$ uname -r
5.4.0-1024-gcp

rodny@gke-cluster-1-default-pool-e78f2962-q6x8:~/bin$ lsb_release -d
Description:	Ubuntu 18.04.5 LTS

rodny@gke-cluster-1-default-pool-e78f2962-q6x8:~/bin$ kubectl version
Client Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.12-gke.1504", GitCommit:"17061f5bd4ee34f72c9281d49f94b4f3ac31ac25", GitTreeState:"clean", BuildDate:"2020-10-19T17:02:11Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17+", GitVersion:"v1.17.12-gke.1504", GitCommit:"17061f5bd4ee34f72c9281d49f94b4f3ac31ac25", GitTreeState:"clean", BuildDate:"2020-10-19T17:00:22Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"}

In this setup, Istio's sidecar-injection is automatically configured for the 'default' namespace, so istio-init is properly setting the iptables as expected:

root@sysbox-in-docker-6d8fc47bb6-xgz6v:/# iptables -L -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
LOG        all  --  anywhere             anywhere             LOG level debug prefix "rodny-nat-prerouting "
ISTIO_INBOUND  tcp  --  anywhere             anywhere
DOCKER     all  --  anywhere             anywhere             ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT)
target     prot opt source               destination
LOG        all  --  anywhere             anywhere             LOG level debug prefix "rodny-nat-input "

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
LOG        all  --  anywhere             anywhere             LOG level debug prefix "rodny-nat-output "
ISTIO_OUTPUT  tcp  --  anywhere             anywhere
DOCKER     all  --  anywhere            !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
LOG        all  --  anywhere             anywhere             LOG level debug prefix "rodny-nat-postrouting "
MASQUERADE  all  --  172.24.0.0/16        anywhere

Chain DOCKER (2 references)
target     prot opt source               destination
RETURN     all  --  anywhere             anywhere

Chain ISTIO_INBOUND (1 references)
target     prot opt source               destination
RETURN     tcp  --  anywhere             anywhere             tcp dpt:ssh
RETURN     tcp  --  anywhere             anywhere             tcp dpt:15020
ISTIO_IN_REDIRECT  tcp  --  anywhere             anywhere

Chain ISTIO_IN_REDIRECT (2 references)
target     prot opt source               destination
REDIRECT   tcp  --  anywhere             anywhere             redir ports 15006

Chain ISTIO_OUTPUT (1 references)
target     prot opt source               destination
RETURN     all  --  127.0.0.6            anywhere
ISTIO_IN_REDIRECT  all  --  anywhere            !localhost
RETURN     all  --  anywhere             anywhere             owner UID match 1337
RETURN     all  --  anywhere             anywhere             owner GID match 1337
RETURN     all  --  anywhere             localhost
ISTIO_REDIRECT  all  --  anywhere             anywhere

Chain ISTIO_REDIRECT (1 references)
target     prot opt source               destination
REDIRECT   tcp  --  anywhere             anywhere             redir ports 15001
root@sysbox-in-docker-6d8fc47bb6-xgz6v:/#

Problem is specifically with TCP traffic initiated from inner containers, which is blackholed at host level. Problem seems to be a direct consequence of this forwarding sequence:

  • TCP Sync packet generated from inner container is processed by PREROUTING chain and re-directed to Istio's inbound-handler (port 15006). Note that the source address here (172.24.0.2) corresponds to the egress iface of the inner container, and 172.24.0.1 is docker0's interface within this POD.
 	 ... nat-prerouting IN=docker0 OUT= PHYSIN=vethfd9bdf3 MAC=02:42:c7:d0:5d:05:02:42:ac:18:00:02:08:00 SRC=172.24.0.2 DST=74.125.20.101
  • Packet exits Istio logic and is now processed by OUTPUT chain; however, this one is now sourced from 127.0.0.6:
	 ... nat-output IN= OUT=eth0 SRC=127.0.0.6 DST=74.125.20.101 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52031 DF PROTO=TCP SPT=51503 DPT=80 WINDOW=42600 RES=0x00 SYN URGP=0
  • Packet hits POSTROUTING chain:
	 ... nat-postrouting IN= OUT=eth0 SRC=127.0.0.6 DST=74.125.20.101 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=52031 DF PROTO=TCP SPT=51503 DPT=80 WINDOW=42600 RES=0x00 SYN URGP=0
  • Packet ends up being discarded in host network namespace as '127.0.0.6' is not routable.
	 ... kernel: IPv4: martian source 74.125.20.101 from 127.0.0.6, on dev cbr0

I have found a workaround that basically masquerades all traffic that hits POSTROUTING chain with source-address == '127.0.0.6', but i feel that this may not be a proper/generic-enough solution for this problem.

$ iptables -t nat -A POSTROUTING -s 127.0.0.6 ! -o docker0 -j MASQUERADE

$ iptables -L -t nat
...
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 MASQUERADE  all  --  *      !docker0  172.24.0.0/16        0.0.0.0/0
    0     0 MASQUERADE  all  --  *      !docker0  127.0.0.6            0.0.0.0/0
...

Problem is clearly not a Sysbox issue as traffic is dropped regardless of the inner container being launched with Sysbox or the regular runc. However, we are tracking this one here as this is a common Sysbox deployment setup in K8s scenarios.

rodnymolina avatar Nov 23 '20 23:11 rodnymolina