moby
moby copied to clipboard
TCP-RESET Packets are not masqueraded
Description of problem: A few outgoing docker TCP-packets (just the TCP RESET-Packets !) are not masqueraded. This connections are not closed properly which leads to many open timing out connections.
docker version:
Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:25:01 UTC 2015
OS/Arch: linux/amd64
Server:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5
Built: Fri Nov 20 13:25:01 UTC 2015
OS/Arch: linux/amd64
docker info:
Containers: 24
Images: 152
Server Version: 1.9.1
Storage Driver: devicemapper
Pool Name: docker-253:1-655723-pool
Pool Blocksize: 65.54 kB
Base Device Size: 10.74 GB
Backing Filesystem:
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 6.73 GB
Data Space Total: 107.4 GB
Data Space Available: 90.2 GB
Metadata Space Used: 8.729 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.139 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.93-RHEL7 (2015-01-28)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.18.17-13.el7.x86_64
Operating System: CentOS Linux 7 (Core)
CPUs: 4
Total Memory: 23.59 GiB
uname -a:
Linux pcp-cons-vm1.pcp.asudc.net 3.18.17-13.el7.x86_64 #1 SMP Wed Jul 22 14:20:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Environment details (AWS, VirtualBox, physical, etc.): VMWare VM
How reproducible:
Steps to Reproduce:
- Initial setup:
Container are build with docker-maven-plugin BaseImage ist centos:7 runcmd:
rm /etc/localtime
ln -s /usr/share/zoneinfo/Europe/Berlin /etc/localtime
curl -jksSLH "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u66-b17/jdk-8u66-linux-x64.rpm > jdk-8u66-linux-x64.rpm
yum -y install jdk-8u66-linux-x64.rpm
rm -f jdk-8u66-linux-x64.rpm
- Container are started with Ansible:
- docker:
image: "{{ image_name }}:{{ image_version }}"
name: "{{ application_name }}"
expose: "{{ application_port }}" # workaround for https://github.com/ansible/ansible-modules-core/issues/147
ports: "{{ application_port }}:{{ application_port }},{{ application_jmx_port }}:{{ application_jmx_port }}"
env:
application.stage: "{{ stage }}"
application.host: "{{ inventory_hostname }}"
state: "reloaded"
restart_policy: "always"
username: "{{ username }}"
password: "{{ password }}"
email: "{{ email }}"
- TCPDUMP the Networktraffic
$ tcpdump -vvv '(src net 172.17.0.0/16 and not dst net 172.17.0.0/16)' -i eno16780032
Actual Results:
tcpdump: listening on eno16780032, link-type EN10MB (Ethernet), capture size 65535 bytes
09:40:20.765078 IP (tos 0x0, ttl 63, id 63793, offset 0, flags [DF], proto TCP (6), length 40)
172.17.0.94.44220 > XXXXXXX.https: Flags [R], cksum 0x733e (correct), seq 3364944719, win 0, length 0
09:40:23.114700 IP (tos 0x0, ttl 63, id 64957, offset 0, flags [DF], proto TCP (6), length 40)
172.17.0.94.44386 > XXXXXXX.https: Flags [R], cksum 0x04f6 (correct), seq 1452858090, win 0, length 0
09:40:30.599823 IP (tos 0x0, ttl 63, id 27499, offset 0, flags [DF], proto TCP (6), length 40)
172.17.0.94.41293 > XXXXXXX.https: Flags [R], cksum 0x70d6 (correct), seq 2154385743, win 0, length 0
09:40:43.076053 IP (tos 0x0, ttl 63, id 7793, offset 0, flags [DF], proto TCP (6), length 40)
172.17.0.94.45254 > XXXXXXX.https: Flags [R], cksum 0xae49 (correct), seq 3912735635, win 0, length 0
09:40:47.884495 IP (tos 0x0, ttl 63, id 11156, offset 0, flags [DF], proto TCP (6), length 40)
172.17.0.94.45514 > XXXXXXX.https: Flags [R], cksum 0x3db8 (correct), seq 509662712, win 0, length 0
Expected Results:
tcpdump is empty (Every Outgoing TCP-Packet is masqueraded)
Additional info:
iptables -t nat -L:
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
DOCKER all -- anywhere anywhere ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
DOCKER all -- anywhere anywhere ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE all -- anywhere anywhere ADDRTYPE match src-type LOCAL
MASQUERADE all -- 172.17.0.0/16 anywhere
MASQUERADE tcp -- 172.17.0.14 172.17.0.14 tcp dpt:XXXX2
MASQUERADE tcp -- 172.17.0.14 172.17.0.14 tcp dpt:XXXX1
Chain DOCKER (2 references)
target prot opt source destination
DNAT tcp -- anywhere anywhere tcp dpt:19392 to:172.17.0.14:XXXX2
DNAT tcp -- anywhere anywhere tcp dpt:19391 to:172.17.0.14:XXXX1
iptables -L:
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
DOCKER all -- anywhere anywhere
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Chain DOCKER (1 references)
target prot opt source destination
ACCEPT tcp -- anywhere 172.17.0.14 tcp dpt:XXXX2
ACCEPT tcp -- anywhere 172.17.0.14 tcp dpt:XXXX1
USER POLL
The best way to get notified of updates is to use the Subscribe button on this page.
Please don't use "+1" or "I have this too" comments on issues. We automatically collect those comments to keep the thread short.
The people listed below have upvoted this issue by leaving a +1 comment:
@subsend
/cc @mavenugo
ping @mavenugo Could you take a look at this, please?
@goekhanm I don't have a particular idea on this issue. But I will ask a few questions that might help.
- Could you please try 1.10.3 and confirm the behavior ?
- Do you have userland-proxy disabled ?
- Do you have firewalld enabled ?
- Can you please capture the output of
iptables -nvLwhich will also give us details on packet counter.
BTW, I tried this scenario in latest docker engine in ubuntu 14.04 and it seems to work just fine.
Is this still an issue?
The issue is still reproducible and we're facing it.
@upietz could you provide more info about your setup (output of docker version, and docker info), and additional info from @mavenugo's comment above https://github.com/docker/docker/issues/18630#issuecomment-195989351
docker version Client: Version: 1.12.2 API version: 1.24 Go version: go1.6.3 Git commit: bb80604 Built: Tue Oct 11 17:43:41 2016 OS/Arch: linux/amd64
Server: Version: 1.12.2 API version: 1.24 Go version: go1.6.3 Git commit: bb80604 Built: Tue Oct 11 17:43:41 2016 OS/Arch: linux/amd64
docker info Containers: 100 Running: 25 Paused: 0 Stopped: 75 Images: 22 Server Version: 1.12.2 Storage Driver: aufs Root Dir: /var/lib/docker/aufs Backing Filesystem: extfs Dirs: 544 Dirperm1 Supported: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge null overlay host Swarm: inactive Runtimes: runc Default Runtime: runc Security Options: Kernel Version: 3.16.0-4-amd64 Operating System: Debian GNU/Linux 8 (jessie) OSType: linux Architecture: x86_64 CPUs: 24 Total Memory: 47.26 GiB Name: mesos-slave5 ID: HGK2:BUX2:CUKS:LLW5:FLLR:TCEU:BYP4:5J66:V6Q4:MT2O:QFM5:XS44 Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ WARNING: No kernel memory limit support WARNING: No cpu cfs quota support WARNING: No cpu cfs period support Insecure Registries: 127.0.0.0/8
iptables -nvL
Chain INPUT (policy ACCEPT 647K packets, 75M bytes) pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
9119M 14T DOCKER-ISOLATION all -- * * 0.0.0.0/0 0.0.0.0/0
29G 33T DOCKER all -- * docker0 0.0.0.0/0 0.0.0.0/0
23G 21T ACCEPT all -- * docker0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
28G 48T ACCEPT all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- docker0 docker0 0.0.0.0/0 0.0.0.0/0
Chain OUTPUT (policy ACCEPT 1532K packets, 694M bytes) pkts bytes target prot opt in out source destination
Chain DOCKER (1 references)
pkts bytes target prot opt in out source destination
374K 37M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.16 tcp dpt:5051
562K 46M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.16 tcp dpt:5050
562K 48M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.20 tcp dpt:4567
374K 37M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.14 tcp dpt:5051
62M 14G ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.14 tcp dpt:5050
936K 68M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.23 tcp dpt:4000
404M 1761G ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.9 tcp dpt:3000
13M 1434M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.11 tcp dpt:8080
12M 1351M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.18 tcp dpt:8080
12M 1308M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.19 tcp dpt:8080
1819K 151M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.4 tcp dpt:8080
324K 33M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.24 tcp dpt:8080
318K 32M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.32 tcp dpt:5051
55M 13G ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.32 tcp dpt:5050
1666K 186M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.33 tcp dpt:8080
362K 31M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.13 tcp dpt:4567
250M 1096G ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.10 tcp dpt:3000
8503K 926M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.26 tcp dpt:8080
0 0 ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.3 tcp dpt:9010
3538K 1131M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.3 tcp dpt:4567
0 0 ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.6 tcp dpt:9010
3544K 1134M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.6 tcp dpt:4567
0 0 ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.7 tcp dpt:9010
3560K 1141M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.7 tcp dpt:4567
5439K 27G ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.15 tcp dpt:8080
1259K 597M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.17 tcp dpt:8080
1850K 9168M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.5 tcp dpt:8080
590K 2804M ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.8 tcp dpt:8080
9625 1827K ACCEPT tcp -- !docker0 docker0 0.0.0.0/0 172.17.0.12 tcp dpt:4567
Chain DOCKER-ISOLATION (1 references)
pkts bytes target prot opt in out source destination
9119M 14T RETURN all -- * * 0.0.0.0/0 0.0.0.0/0
userland-proxy is enabled
Kernel Version: 3.16.0-4-amd64 probably has something to do with this issue. That kernel is most likely not receiving the relevant fixes.
Can you reproduce this with another distribution, kernel or distribution and kernel?
I am having the same problem -- pretty confused! Pretty simple to repro though: https://gist.github.com/moribellamy/43649b23836786a65bc583c3210a8be5
I've set up a simple docker server that listens for a TCP connection and sends a RST packet, immediately, to the first client. Then it exits.
If you initiate the connection inside the container (telnet will do) and run tcpdump, you get this (expected output, notice the R packets)
root@4fcb979b150c:/# tcpdump -i lo -e 'port 12345' # inside the docker container tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes 00:00:46.990936 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 74: localhost.57382 > localhost.12345: Flags [S], seq 3411329055, win 43690, options [mss 65495,sackOK,TS val 1613614 ecr 0,nop,wscale 7], length 0 00:00:46.990960 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 74: localhost.12345 > localhost.57382: Flags [S.], seq 1273088479, ack 3411329056, win 43690, options [mss 65495,sackOK,TS val 1613614 ecr 1613614,nop,wscale 7], length 0 00:00:46.990980 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 66: localhost.57382 > localhost.12345: Flags [.], ack 1, win 342, options [nop,nop,TS val 1613614 ecr 1613614], length 0 00:00:46.991323 00:00:00:00:00:00 (oui Ethernet) > 00:00:00:00:00:00 (oui Ethernet), ethertype IPv4 (0x0800), length 66: localhost.12345 > localhost.57382: Flags [R.], seq 1, ack 1, win 342, options [nop,nop,TS val 1613614 ecr 1613614], length 0
If you initiate the connection from outside the container (e.g. docker run -p 12345:12345 local:rst), you get no such reset packets
tcpdump -i lo0 -e 'port 12345' # on the host machine 17:20:15.220923 AF IPv6 (30), length 88: localhost.56449 > localhost.italk: Flags [S], seq 212628271, win 65535, options [mss 16324,nop,wscale 5,nop,nop,TS val 197372606 ecr 0,sackOK,eol], length 0 17:20:15.221009 AF IPv6 (30), length 88: localhost.italk > localhost.56449: Flags [S.], seq 3769277609, ack 212628272, win 65535, options [mss 16324,nop,wscale 5,nop,nop,TS val 197372606 ecr 197372606,sackOK,eol], length 0 17:20:15.221020 AF IPv6 (30), length 76: localhost.56449 > localhost.italk: Flags [.], ack 1, win 12743, options [nop,nop,TS val 197372606 ecr 197372606], length 0 17:20:15.221028 AF IPv6 (30), length 76: localhost.italk > localhost.56449: Flags [.], ack 1, win 12743, options [nop,nop,TS val 197372606 ecr 197372606], length 0 17:20:15.222191 AF IPv6 (30), length 76: localhost.italk > localhost.56449: Flags [F.], seq 1, ack 1, win 12743, options [nop,nop,TS val 197372607 ecr 197372606], length 0 17:20:15.222213 AF IPv6 (30), length 76: localhost.56449 > localhost.italk: Flags [.], ack 2, win 12743, options [nop,nop,TS val 197372607 ecr 197372607], length 0 17:20:15.222276 AF IPv6 (30), length 76: localhost.56449 > localhost.italk: Flags [F.], seq 1, ack 2, win 12743, options [nop,nop,TS val 197372607 ecr 197372607], length 0 17:20:15.222310 AF IPv6 (30), length 76: localhost.italk > localhost.56449: Flags [.], ack 2, win 12743, options [nop,nop,TS val 197372607 ecr 197372607], length 0
in this particular case, telnet does the right thing in each case (shuts down). My guess is because telnet responds to the server FIN with its own FIN, but i'm not a TCP guru. In general, though, some applications need that RST packet (for reasons cited in the initial report)
EDIT: here is my docker info. running on mac osx. addressing @unclejack's concern that it may be the kernel version.
[0] 05:21:45 ~/rooms/rst$ docker info Containers: 28 Running: 1 Paused: 0 Stopped: 27 Images: 127 Server Version: 17.06.0-ce Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4 init version: 949e6fa Security Options: seccomp Profile: default Kernel Version: 4.9.31-moby Operating System: Alpine Linux v3.5 OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 1.952GiB Name: moby ID: S6N2:K4Z7:6CUL:6S2X:HEV2:DBG2:LGE6:YH35:IGL5:BCZI:QKMJ:MCX2 Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): true File Descriptors: 29 Goroutines: 51 System Time: 2017-07-11T00:31:15.184817286Z EventsListeners: 1 No Proxy: *.local, 169.254/16 Registry: https://index.docker.io/v1/ Experimental: true Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false
@mavenugo i'll ask just because someone else did before :). Could you give a quick comment, or include someone who is willing to give a quick comment on my previous post? Based on my experiment above and other corroborating stories, docker containers only support TCP orderly resets and not TCP abortive resets.
Could be a subtle issue with our setups, but I bet my experiment will repro for most people. Whether or not I interpreted my experiment correctly, IDK.
It's been a few years so at this point I'm just too curious. Does anyone know more about the scope of this bug? Is it limited to only the scenario where OSX is the host OS, or something?
I just find it odd that docker is so ubiquitous in the world of production software, but Layer 4 networking doesn't seem to work fully. Do people mostly use docker for higher level protocols which paper over this issue?
EDIT: Regarding earlier "old kernel" concerns, my repro instructions in https://github.com/moby/moby/issues/18630#issuecomment-314288184 still work for ubuntu 18.04.
@moribellamy I think conntrack might be marking the packets as INVALID and not performing the nat.
can you please investigate further by running iptables commands using the log module e.g.
sudo iptables -A INPUT -j LOG -m state --state INVALID or viewing stats in the conntrack cli
Any new information or updates regarding this possible bug?
My theory is that this issue only manifests with OSX is the host OS. OSX docker is totally different from most production stacks, since it uses linux as hypervised by https://github.com/moby/hyperkit, in contrast to a native containerization setup you would see on a linux distro.
I just strongly suspect that if this ever happened in prod, someone would have fixed it.
Anecdotal data: I no longer have an issue with it because it was only affecting my development workstation. As soon as I found out the issue wasn't happening in prod (for me), I just tolerated this issue instead of taking the time to debug it.
Today we noticed several production hosts this was happening with. Leaking TCP reset packets due to invalid state. A few of the hosts experiencing the issue on a regular basis were hitting conntrack state table max size.
We found some other hosts that we are still investigating, which are not exceeding conntrack state table max size. If I understand the problem, I think the NAT engine can't perform NAT on invalid packets, so the NAT engine just passes them on to the host interface.
Reducing the conntrack state table size or bombarding a host until the state table is full might be one way to generate INVALID ctstate packets that escape onto the host interface. Just a hypothesis at the moment.
I might try and replicate myself if I get some time to see if that might help narrow the scope on this issue.
"kernel: nf_conntrack: table full, dropping packet" << Should appear in the systemd journal once the conntrack table is full if anyone wants to try this as a potential way to replicate.
OS: Linux Distribution: CentOS 7
Several different versions of docker + Linux kernel versions experiencing the same problem.
The issue seems to be opened for 7 years already. Any idea why it cannot be addressed?