gvisor icon indicating copy to clipboard operation
gvisor copied to clipboard

[netstack] Invalid TCP checksums for packets from published ports in Docker Desktop

Open maddn opened this issue 5 months ago • 7 comments

Description

TCP packets from published ports (on Docker Desktop) are all arriving with invalid checksums.

I believe the issue is that the checksum is initially zeroed out. When the packet is forwarded to the container I think the checksum is recalcluated incorrectly because of the fix to bug #5340

I don't think this is an issue for traffic intended for the container, but when I try to forward the packet to another destination, it is dropped there because of the invalid checksum.

I suggest a fix would be to calculate a fresh checksum when it's zeroed out, otherwise use the incremental update.

Steps to reproduce

~ % docker run -p 19080:19080 --rm -it alpine
/ # apk add tcpdump
fetch https://dl-cdn.alpinelinux.org/alpine/v3.22/main/aarch64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.22/community/aarch64/APKINDEX.tar.gz
(1/2) Installing libpcap (1.10.5-r1)
(2/2) Installing tcpdump (4.99.5-r1)
Executing busybox-1.37.0-r18.trigger
OK: 9 MiB in 18 packages
/ # tcpdump -vvv -i eth0
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes

Also run tcpdump on the Docker VM

/mnt # ./tcpdump -vvv -i eth0
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes

Connect to localhost:19080 on host (I used a browser)

Docker VM tcpdump:

11:41:17.798844 IP (tos 0x0, ttl 64, id 45890, offset 0, flags [none], proto TCP (6), length 60)
    192.168.65.1.65125 > 192.168.65.4.19080: Flags [S], cksum 0x0000 (incorrect -> 0x2eee), seq 1759701479, win 65408, options [mss 65495,nop,nop,TS val 4148726806 ecr 0,nop,wscale 7], length 0

Container tcpdump:

11:41:17.798951 IP (tos 0x0, ttl 62, id 45890, offset 0, flags [none], proto TCP (6), length 60)
    192.168.65.1.65125 > 5c59c61d5332.19080: Flags [S], cksum 0x5597 (incorrect -> 0x8485), seq 1759701479, win 65408, options [mss 65495,nop,nop,TS val 4148726806 ecr 0,nop,wscale 7], length 0
11:41:17.799038 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)

runsc version


docker version (if using docker)

docker version
Client: Docker Engine - Community
 Version:           28.3.2
 API version:       1.43 (downgraded from 1.51)
 Go version:        go1.24.5
 Git commit:        578ccf6
 Built:             Wed Jul  9 16:14:01 2025
 OS/Arch:           linux/arm64
 Context:           default

Server: Docker Desktop 4.23.0 (120376)
 Engine:
  Version:          24.0.6
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.7
  Git commit:       1a79695
  Built:            Mon Sep  4 12:31:36 2023
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.22
  GitCommit:        8165feabfdfe38c65b599c4993d227328c231fca
 runc:
  Version:          1.1.8
  GitCommit:        v1.1.8-0-g82f18fe
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

uname

Linux 5c59c61d5332 6.3.13-linuxkit #1 SMP PREEMPT Thu Sep 7 07:48:47 UTC 2023 aarch64 Linux

kubectl (if using Kubernetes)


repo state (if built from source)

No response

runsc debug logs (if available)


maddn avatar Jul 23 '25 12:07 maddn

I may be missing something, but I don't believe the mentioned fix is the cause of this issue. The fix is in the iptables code, yet your container has no iptables rules configured.

Here are tcpdump output from both native and runsc containers, the incorrect checksum mark appears in both. This is a common symptom of checksum offloading being enabled on the network device, which is the default for most NICs:

avagin@avagin:~$ docker run -p 19080:19080 --rm -it alpine
...
/ # tcpdump -vvv -n -i eth0
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
18:47:01.206326 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::5cc4:43ff:fe24:3bd0 > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 16
	  source link-address option (1), length 8 (1): 5e:c4:43:24:3b:d0
	    0x0000:  5ec4 4324 3bd0
18:47:34.388957 IP (tos 0x0, ttl 64, id 38046, offset 0, flags [DF], proto TCP (6), length 60)
    172.17.0.1.51740 > 172.17.0.2.19080: Flags [S], cksum 0x5854 (incorrect -> 0x3897), seq 2132314539, win 64240, options [mss 1460,sackOK,TS val 2782918921 ecr 0,nop,wscale 7], length 0
18:47:34.388974 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    172.17.0.2.19080 > 172.17.0.1.51740: Flags [R.], cksum 0x3a41 (correct), seq 0, ack 2132314540, win 0, length 0
avagin@avagin:~$ docker run --runtime runsc -p 19080:19080 --rm -it alpine
...
/ # tcpdump -i eth0 -vvv -n
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
19:06:33.682725 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::850:d5ff:fe8b:c741 > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 16
	  source link-address option (1), length 8 (1): 0a:50:d5:8b:c7:41
	    0x0000:  0a50 d58b c741
19:06:38.670633 IP (tos 0x0, ttl 64, id 62988, offset 0, flags [DF], proto TCP (6), length 60)
    172.17.0.1.33510 > 172.17.0.2.19080: Flags [S], cksum 0x5854 (incorrect -> 0xc032), seq 2330511243, win 64240, options [mss 1460,sackOK,TS val 2784063202 ecr 0,nop,wscale 7], length 0
19:06:38.670800 IP (tos 0x0, ttl 64, id 63452, offset 0, flags [none], proto TCP (6), length 40)
    172.17.0.2.19080 > 172.17.0.1.33510: Flags [R.], cksum 0x37c7 (correct), seq 0, ack 2330511244, win 0, length 

avagin avatar Jul 31 '25 19:07 avagin

I think you are correct that the checksum offloading causes the initial zeroed out checksum. But when you capture the traffic on the interface inside the container, hasn't a DNAT rule on the Docker VM already updated the destination IP address and recalculated the checksum using an incremental update?

maddn avatar Jul 31 '25 19:07 maddn

@maddn I am sorry, but I don't understand your question. Could you be more detailed? If you are talking about outgoing packets, so DNAT rules will be applied in the host network namespace. If you are capturing packets inside the container, you should see origin packets generated by the container stack.

avagin avatar Aug 04 '25 18:08 avagin

Sorry for the confusion, I'm referring to when you publish a port, Docker Desktop creates a NAT rule.

Start a container with a published port:

docker run -p 19080:19080 --rm -it alpine

In another terminal enter the Docker VM and check the NAT rules

 ~ % docker run -it --rm --privileged --pid=host justincormack/nsenter1

sh-5.2# iptables -L -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination
DOCKER     all  --  anywhere             anywhere             ADDRTYPE match dst-type LOCAL

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
DOCKER     all  --  anywhere             anywhere             ADDRTYPE match dst-type LOCAL

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
MASQUERADE  all  --  anywhere             anywhere             ADDRTYPE match src-type LOCAL
MASQUERADE  all  --  172.17.0.0/16        anywhere
MASQUERADE  tcp  --  172.17.0.2           172.17.0.2           tcp dpt:19080

Chain DOCKER (2 references)
target     prot opt source               destination
DNAT       tcp  --  anywhere             anywhere             tcp dpt:19080 to:172.17.0.2:19080

maddn avatar Aug 04 '25 18:08 maddn

I don't want to over complicate this issue though.

The problem is that any NAT rules inside the container will result in invalid tcp checksums for traffic from published ports because the packet always starts with a checksum of 0 and so the incremental update won't work.

~ % docker run -p 19080:19080 --rm -it --cap-add=NET_ADMIN alpine
/ # iptables -A PREROUTING -p tcp -m tcp --dport 19080 -j DNAT --to-destination 198.18.134.29:80 -t nat
/ # tcpdump -i eth0 -vvv tcp port 19080 or tcp port 80
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes


18:52:01.558800 IP (tos 0x0, ttl 63, id 52988, offset 0, flags [none], proto TCP (6), length 60)
    192.168.65.1.40394 > 7b8d50071a4f.19080: Flags [S], cksum 0x0000 (incorrect -> 0x6c70), seq 2127016582, win 65408, options [mss 65495,nop,nop,TS val 1196480061 ecr 0,nop,wscale 7], length 0
18:52:01.558820 IP (tos 0x0, ttl 62, id 52988, offset 0, flags [none], proto TCP (6), length 60)
    192.168.65.1.40394 > 198.18.134.29.80: Flags [S], cksum 0xaa1b (incorrect -> 0x168c), seq 2127016582, win 65408, options [mss 65495,nop,nop,TS val 1196480061 ecr 0,nop,wscale 7], length 0

You can see the packet starts with a checksum of 0x0000, after the DNAT rule is applied the updated checksum is incorrect. The packet is then dropped at the destination.

maddn avatar Aug 04 '25 19:08 maddn

@maddn you don't use runsc, do you? Do you file this bug just because docker-desktop uses the gvisor network stack internally?

avagin avatar Aug 05 '25 17:08 avagin

Yes, if I update the network type in docker desktop to use vpnkit instead, this bug doesn't happen, the checksums are recalculated correctly. But when it's using gvisor the checksums are incorrect.

maddn avatar Aug 05 '25 17:08 maddn