libnetwork
libnetwork copied to clipboard
Strange iptables nat MASQUERADE rules
libnetwork is making some really strange MASQUERADE iptables rules in the nat POSTROUTING table that I can't make any sense of when publishing ports in docker. For example after publishing 34197 udp and 27015 tcp the following rules end up into the iptables:
# iptables -t nat -L -v
...
Chain POSTROUTING (policy ACCEPT 258 packets, 15578 bytes)
pkts bytes target prot opt in out source destination
2806 88291 MASQUERADE all -- any !docker0 172.17.0.0/16 anywhere
0 0 MASQUERADE udp -- any any 172.17.0.2 172.17.0.2 udp dpt:34197
0 0 MASQUERADE tcp -- any any 172.17.0.2 172.17.0.2 tcp dpt:27015
...
The first rule is for traffic from the container(s) to the outside and makes sense. But the two other rules supposedly masquerade traffic going from the container's internal IP to the published port on container's internal IP via the host's address on that interface (usually 172.17.0.1). As I understand iptables the nat POSTROUTING table is never consulted for packets with a local destination, meaning these rules cannot possibly be hit. Is there some kind of networking voodoo here I'm not getting?
The code that does this (as far as I can tell) is line 273-283 in iptables.go and was added by @porjo in https://github.com/moby/moby/commit/0da92633b4161ed1f8babe5ec4a9fe98257d34b5#diff-ba8c3ab87579147ddeff26dd29c70f44R149 only described as "Create tests for pkg/iptables" in the commit message. It's part of the Move per-container forward rules to DOCKER chain#7003 pull request in Moby. But there doesn't appear to be any explanation of why these rules were added in that pull request discussion. The only thing I can think of is that they were supposed to be some kind of outbound port mapping, but they are just far to broken for that.
looking for answers too.
I found that the rule is used when hairpin NAT is enabled (--userland-proxy=false). When a packet from the container's internal IP to the host's published port arrives, it's destination gets DNATed to the published port on the container's internal IP.