[netstack] Invalid TCP checksums for packets from published ports in Docker Desktop
Description
TCP packets from published ports (on Docker Desktop) are all arriving with invalid checksums.
I believe the issue is that the checksum is initially zeroed out. When the packet is forwarded to the container I think the checksum is recalcluated incorrectly because of the fix to bug #5340
I don't think this is an issue for traffic intended for the container, but when I try to forward the packet to another destination, it is dropped there because of the invalid checksum.
I suggest a fix would be to calculate a fresh checksum when it's zeroed out, otherwise use the incremental update.
Steps to reproduce
~ % docker run -p 19080:19080 --rm -it alpine
/ # apk add tcpdump
fetch https://dl-cdn.alpinelinux.org/alpine/v3.22/main/aarch64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.22/community/aarch64/APKINDEX.tar.gz
(1/2) Installing libpcap (1.10.5-r1)
(2/2) Installing tcpdump (4.99.5-r1)
Executing busybox-1.37.0-r18.trigger
OK: 9 MiB in 18 packages
/ # tcpdump -vvv -i eth0
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
Also run tcpdump on the Docker VM
/mnt # ./tcpdump -vvv -i eth0
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
Connect to localhost:19080 on host (I used a browser)
Docker VM tcpdump:
11:41:17.798844 IP (tos 0x0, ttl 64, id 45890, offset 0, flags [none], proto TCP (6), length 60)
192.168.65.1.65125 > 192.168.65.4.19080: Flags [S], cksum 0x0000 (incorrect -> 0x2eee), seq 1759701479, win 65408, options [mss 65495,nop,nop,TS val 4148726806 ecr 0,nop,wscale 7], length 0
Container tcpdump:
11:41:17.798951 IP (tos 0x0, ttl 62, id 45890, offset 0, flags [none], proto TCP (6), length 60)
192.168.65.1.65125 > 5c59c61d5332.19080: Flags [S], cksum 0x5597 (incorrect -> 0x8485), seq 1759701479, win 65408, options [mss 65495,nop,nop,TS val 4148726806 ecr 0,nop,wscale 7], length 0
11:41:17.799038 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
runsc version
docker version (if using docker)
docker version
Client: Docker Engine - Community
Version: 28.3.2
API version: 1.43 (downgraded from 1.51)
Go version: go1.24.5
Git commit: 578ccf6
Built: Wed Jul 9 16:14:01 2025
OS/Arch: linux/arm64
Context: default
Server: Docker Desktop 4.23.0 (120376)
Engine:
Version: 24.0.6
API version: 1.43 (minimum version 1.12)
Go version: go1.20.7
Git commit: 1a79695
Built: Mon Sep 4 12:31:36 2023
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.6.22
GitCommit: 8165feabfdfe38c65b599c4993d227328c231fca
runc:
Version: 1.1.8
GitCommit: v1.1.8-0-g82f18fe
docker-init:
Version: 0.19.0
GitCommit: de40ad0
uname
Linux 5c59c61d5332 6.3.13-linuxkit #1 SMP PREEMPT Thu Sep 7 07:48:47 UTC 2023 aarch64 Linux
kubectl (if using Kubernetes)
repo state (if built from source)
No response
runsc debug logs (if available)
I may be missing something, but I don't believe the mentioned fix is the cause of this issue. The fix is in the iptables code, yet your container has no iptables rules configured.
Here are tcpdump output from both native and runsc containers, the incorrect checksum mark appears in both. This is a common symptom of checksum offloading being enabled on the network device, which is the default for most NICs:
avagin@avagin:~$ docker run -p 19080:19080 --rm -it alpine
...
/ # tcpdump -vvv -n -i eth0
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
18:47:01.206326 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::5cc4:43ff:fe24:3bd0 > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 16
source link-address option (1), length 8 (1): 5e:c4:43:24:3b:d0
0x0000: 5ec4 4324 3bd0
18:47:34.388957 IP (tos 0x0, ttl 64, id 38046, offset 0, flags [DF], proto TCP (6), length 60)
172.17.0.1.51740 > 172.17.0.2.19080: Flags [S], cksum 0x5854 (incorrect -> 0x3897), seq 2132314539, win 64240, options [mss 1460,sackOK,TS val 2782918921 ecr 0,nop,wscale 7], length 0
18:47:34.388974 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
172.17.0.2.19080 > 172.17.0.1.51740: Flags [R.], cksum 0x3a41 (correct), seq 0, ack 2132314540, win 0, length 0
avagin@avagin:~$ docker run --runtime runsc -p 19080:19080 --rm -it alpine
...
/ # tcpdump -i eth0 -vvv -n
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
19:06:33.682725 IP6 (hlim 255, next-header ICMPv6 (58) payload length: 16) fe80::850:d5ff:fe8b:c741 > ff02::2: [icmp6 sum ok] ICMP6, router solicitation, length 16
source link-address option (1), length 8 (1): 0a:50:d5:8b:c7:41
0x0000: 0a50 d58b c741
19:06:38.670633 IP (tos 0x0, ttl 64, id 62988, offset 0, flags [DF], proto TCP (6), length 60)
172.17.0.1.33510 > 172.17.0.2.19080: Flags [S], cksum 0x5854 (incorrect -> 0xc032), seq 2330511243, win 64240, options [mss 1460,sackOK,TS val 2784063202 ecr 0,nop,wscale 7], length 0
19:06:38.670800 IP (tos 0x0, ttl 64, id 63452, offset 0, flags [none], proto TCP (6), length 40)
172.17.0.2.19080 > 172.17.0.1.33510: Flags [R.], cksum 0x37c7 (correct), seq 0, ack 2330511244, win 0, length
I think you are correct that the checksum offloading causes the initial zeroed out checksum. But when you capture the traffic on the interface inside the container, hasn't a DNAT rule on the Docker VM already updated the destination IP address and recalculated the checksum using an incremental update?
@maddn I am sorry, but I don't understand your question. Could you be more detailed? If you are talking about outgoing packets, so DNAT rules will be applied in the host network namespace. If you are capturing packets inside the container, you should see origin packets generated by the container stack.
Sorry for the confusion, I'm referring to when you publish a port, Docker Desktop creates a NAT rule.
Start a container with a published port:
docker run -p 19080:19080 --rm -it alpine
In another terminal enter the Docker VM and check the NAT rules
~ % docker run -it --rm --privileged --pid=host justincormack/nsenter1
sh-5.2# iptables -L -t nat
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
DOCKER all -- anywhere anywhere ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
DOCKER all -- anywhere anywhere ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE all -- anywhere anywhere ADDRTYPE match src-type LOCAL
MASQUERADE all -- 172.17.0.0/16 anywhere
MASQUERADE tcp -- 172.17.0.2 172.17.0.2 tcp dpt:19080
Chain DOCKER (2 references)
target prot opt source destination
DNAT tcp -- anywhere anywhere tcp dpt:19080 to:172.17.0.2:19080
I don't want to over complicate this issue though.
The problem is that any NAT rules inside the container will result in invalid tcp checksums for traffic from published ports because the packet always starts with a checksum of 0 and so the incremental update won't work.
~ % docker run -p 19080:19080 --rm -it --cap-add=NET_ADMIN alpine
/ # iptables -A PREROUTING -p tcp -m tcp --dport 19080 -j DNAT --to-destination 198.18.134.29:80 -t nat
/ # tcpdump -i eth0 -vvv tcp port 19080 or tcp port 80
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
18:52:01.558800 IP (tos 0x0, ttl 63, id 52988, offset 0, flags [none], proto TCP (6), length 60)
192.168.65.1.40394 > 7b8d50071a4f.19080: Flags [S], cksum 0x0000 (incorrect -> 0x6c70), seq 2127016582, win 65408, options [mss 65495,nop,nop,TS val 1196480061 ecr 0,nop,wscale 7], length 0
18:52:01.558820 IP (tos 0x0, ttl 62, id 52988, offset 0, flags [none], proto TCP (6), length 60)
192.168.65.1.40394 > 198.18.134.29.80: Flags [S], cksum 0xaa1b (incorrect -> 0x168c), seq 2127016582, win 65408, options [mss 65495,nop,nop,TS val 1196480061 ecr 0,nop,wscale 7], length 0
You can see the packet starts with a checksum of 0x0000, after the DNAT rule is applied the updated checksum is incorrect. The packet is then dropped at the destination.
@maddn you don't use runsc, do you? Do you file this bug just because docker-desktop uses the gvisor network stack internally?
Yes, if I update the network type in docker desktop to use vpnkit instead, this bug doesn't happen, the checksums are recalculated correctly. But when it's using gvisor the checksums are incorrect.