gvisor icon indicating copy to clipboard operation
gvisor copied to clipboard

windows TCP SACK slow the download speed

Open 380wmda999 opened this issue 1 year ago • 9 comments

Description

on windows 11 when I set tcpip.TCPSACKEnabled(true) I test download speed use iperf3 it just 1G/s but when I set tcpip.TCPSACKEnabled(false) and test download speed use iperf3 is 3G/s but get panic halfway follow is panic info: panic: length < 0

goroutine 112 [running]: gvisor.dev/gvisor/pkg/tcpip/stack.PacketData.CapLength({0x4290708}, 0x80000840) D:/gocode/src/gvisor.dev/gvisor/pkg/tcpip/stack/packet_buffer.go:581 +0x5b gvisor.dev/gvisor/pkg/tcpip/transport/tcp.(*sender).splitSeg(0x432ca08, 0x428c788, 0x80000840) D:/gocode/src/gvisor.dev/gvisor/pkg/tcpip/transport/tcp/snd.go:630 +0x12d gvisor.dev/gvisor/pkg/tcpip/transport/tcp.(*sender).maybeSendSegment(0x432ca08, 0x428c788, 0x80000840, 0x62d1c255) D:/gocode/src/gvisor.dev/gvisor/pkg/tcpip/transport/tcp/snd.go:899 +0x197 gvisor.dev/gvisor/pkg/tcpip/transport/tcp.(*sender).sendData(0x432ca08) D:/gocode/src/gvisor.dev/gvisor/pkg/tcpip/transport/tcp/snd.go:1022 +0x1bf gvisor.dev/gvisor/pkg/tcpip/transport/tcp.(*Endpoint).sendData(0x4804508, 0x428c788) D:/gocode/src/gvisor.dev/gvisor/pkg/tcpip/transport/tcp/connect.go:1007 +0x4d gvisor.dev/gvisor/pkg/tcpip/transport/tcp.(*Endpoint).Write(0x4804508, {0xebfff0, 0x40f8b40}, {0x0, 0x0, 0x0, 0x0, {0x0, 0x0, 0x0, ...}}) D:/gocode/src/gvisor.dev/gvisor/pkg/tcpip/transport/tcp/endpoint.go:1663 +0x112 gvisor.dev/gvisor/pkg/tcpip/adapters/gonet.(*TCPConn).Write(0x4311480, {0x46ca000, 0xf000, 0xf000}) D:/gocode/src/gvisor.dev/gvisor/pkg/tcpip/adapters/gonet/gonet.go:389 +0x23a client-cdn/tun2socks/gvisor-netstack.copyConn({0xec4d54, 0x4311480}, {0xec4b14, 0x459c3e8}, 0x0) D:/gocode/src/client-cdn/tun2socks/gvisor-netstack/tcp.go:55 +0x138 client-cdn/tun2socks/gvisor-netstack.handleTCP(0x4311480) D:/gocode/src/client-cdn/tun2socks/gvisor-netstack/tcp.go:42 +0x1f8 client-cdn/tun2socks/gvisor-netstack.Start.func1(0x4040390) D:/gocode/src/client-cdn/tun2socks/gvisor-netstack/tun2socks.go:153 +0x25f created by gvisor.dev/gvisor/pkg/tcpip/transport/tcp.(*Forwarder).HandlePacket in goroutine 303 D:/gocode/src/gvisor.dev/gvisor/pkg/tcpip/transport/tcp/forwarder.go:98 +0x2c0

adapter is wintun I routed 192.168.3.35 to wintun gateway ipfer3 command is: iperf3.exe -c 192.168.3.35 -R

Is this feature related to a specific bug?

No response

Do you have a specific solution in mind?

No response

380wmda999 avatar Apr 30 '24 07:04 380wmda999

I'm having trouble seeing how a negative value gets passed in. That value could be negative due to casting in snd.go:

		if !seg.sequenceNumber.LessThan(end) {
			return false
		}

		available := int(seg.sequenceNumber.Size(end))
		if available == 0 {
			return false
		}

But because of the LessThan guard above, I don't see how this could happen. seg.sequenceNumber shouldn't be more than the maximum TCP window (~1GB) from end, so this should always fit neatly into an int and not overflow.

What sort of machine are you running on? Is it a 32 or 64 bit machine?

kevinGC avatar May 10 '24 16:05 kevinGC

What sort of machine are you running on? Is it a 32 or 64 bit machine?

os is windows 11 I found it happened when I use go1.22.2.windows-386 compile , when I use go1.22.2.windows amd64 compile, it will not get panic so it is not support for 32 bit program on windows ?

380wmda999 avatar May 10 '24 16:05 380wmda999

There is one more question I tested the native downlink using iperf3 on Windows, which is approximately 10G/s But after using gvisor, even if I turn off the sack, the downlink speed is only 2G/s (if I don't turn off the sack, it's only about 1G slower). Why is this Is it a problem with the Windows operating system or with stack performance?

380wmda999 avatar May 10 '24 17:05 380wmda999

It should work on 32 bit Windows, but clearly there's a bug here (likely an overflow).

Netstack will be slower than native. And since it is rarely used on Windows, I don't think it's ever been optimized for Windows at all. That huge slowdown isn't normal (and isn't what we see on other systems), and there's also whatever tun2socks is doing that could slow things down. But native will be faster.

kevinGC avatar May 10 '24 17:05 kevinGC

I also suspect that overflow in 32-bit mode causes the length to become a negative number, causing panic. But when I set TCPSACKEnabled(true), there will be no panic, but the downlink speed of the test will be very slow. So are there any optimization strategies available under Windows system?

380wmda999 avatar May 10 '24 17:05 380wmda999

Unfortunately I don't think we have cycles to debug and optimize netstack on Windows, but we'd welcome the help if you're up to try.

kevinGC avatar May 10 '24 17:05 kevinGC

I may not have that ability because I have encountered many situations during use that I cannot understand For example, on an Android phone, I tested the original network speed at 600M/s for downlink and 60M/s for uplink But after using tun2socks by givsor, the downlink speed was 500m/s, but the uplink speed reached 120M/s

380wmda999 avatar May 10 '24 18:05 380wmda999

Hey, could you try something for me? I have a suspicion about the slowdown from SACK. Can you apply this patch and give me the log output?

diff --git a/pkg/tcpip/transport/tcp/snd.go b/pkg/tcpip/transport/tcp/snd.go
index 5dccf81d1e...b1337faec3 100644
--- a/pkg/tcpip/transport/tcp/snd.go
+++ b/pkg/tcpip/transport/tcp/snd.go
@@ -16,6 +16,7 @@

 import (
        "fmt"
+       "log"
        "math"
        "sort"
        "time"
@@ -196,6 +197,7 @@
        // TCP options that it is including in the packets that it sends.
        // See: https://tools.ietf.org/html/rfc6691#section-2
        maxPayloadSize := int(mss) - ep.maxOptionSize()
+       log.Printf("tcp.newSender called with mss and opt size %d, %d", mss, ep.maxOptionSize())

        s := &sender{
                ep: ep,
@@ -297,6 +299,8 @@
 // attempts to retransmit the first packet above the MTU size.
 // +checklocks:s.ep.mu
 func (s *sender) updateMaxPayloadSize(mtu, count int) {
+       log.Printf("tcp.sender.updateMaxPayloadSize: mtu (%d), count (%d), s.MaxPayloadSize (%d)", mtu, count, s.MaxPayloadSize)
+
        m := mtu - header.TCPMinimumSize

        m -= s.ep.maxOptionSize()

It's possible that the endpoint is being setup with a very small MTU. In that case, enabling SACK could shrink the usable MSS enough that it affects throughput.

kevinGC avatar May 15 '24 17:05 kevinGC

sackenable true:

2024/05/16 04:15:03 Using existing driver 0.14
2024/05/16 04:15:03 Creating adapter
2024/05/16 04:15:03 tcp.newSender called with mss and opt size 65495, 36
2024/05/16 04:15:03 tcp.sender.updateMaxPayloadSize: mtu (1480), count (0), s.MaxPayloadSize (65459)
2024/05/16 04:15:04 tcp.newSender called with mss and opt size 65495, 36
2024/05/16 04:15:04 tcp.sender.updateMaxPayloadSize: mtu (1480), count (0), s.MaxPayloadSize (65459)
2024/05/16 04:15:04 tcp.newSender called with mss and opt size 65495, 36
2024/05/16 04:15:04 tcp.sender.updateMaxPayloadSize: mtu (1480), count (0), s.MaxPayloadSize (65459)
2024/05/16 04:15:13 tcp.newSender called with mss and opt size 65495, 36
2024/05/16 04:15:13 tcp.sender.updateMaxPayloadSize: mtu (1480), count (0), s.MaxPayloadSize (65459)
2024/05/16 04:15:13 tcp.newSender called with mss and opt size 65495, 36
2024/05/16 04:15:13 tcp.sender.updateMaxPayloadSize: mtu (1480), count (0), s.MaxPayloadSize (65459)

iperf out: iperf3.exe -c 192.168.2.5 -R

Connecting to host 192.168.2.5, port 5201
Reverse mode, remote host 192.168.2.5 is sending
[  4] local 192.168.200.200 port 61449 connected to 192.168.2.5 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  68.3 MBytes   573 Mbits/sec
[  4]   1.00-2.00   sec  62.3 MBytes   523 Mbits/sec
[  4]   2.00-3.00   sec  62.8 MBytes   527 Mbits/sec
[  4]   3.00-4.00   sec  64.0 MBytes   537 Mbits/sec
[  4]   4.00-5.00   sec  65.0 MBytes   545 Mbits/sec
[  4]   5.00-6.00   sec  63.9 MBytes   536 Mbits/sec
[  4]   6.00-7.00   sec  62.8 MBytes   527 Mbits/sec
[  4]   7.00-8.00   sec  63.4 MBytes   532 Mbits/sec
[  4]   8.00-9.00   sec  62.8 MBytes   527 Mbits/sec
[  4]   9.00-10.00  sec  60.6 MBytes   509 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec   648 MBytes   544 Mbits/sec                  sender
[  4]   0.00-10.00  sec   636 MBytes   534 Mbits/sec                  receiver

sackenable false:

2024/05/16 04:17:11 Using existing driver 0.14
2024/05/16 04:17:11 Creating adapter
2024/05/16 04:17:12 tcp.newSender called with mss and opt size 65495, 0
2024/05/16 04:17:12 tcp.sender.updateMaxPayloadSize: mtu (1480), count (0), s.MaxPayloadSize (65495)
2024/05/16 04:17:13 tcp.newSender called with mss and opt size 65495, 0
2024/05/16 04:17:13 tcp.sender.updateMaxPayloadSize: mtu (1480), count (0), s.MaxPayloadSize (65495)
2024/05/16 04:17:13 tcp.newSender called with mss and opt size 65495, 0
2024/05/16 04:17:13 tcp.sender.updateMaxPayloadSize: mtu (1480), count (0), s.MaxPayloadSize (65495)
2024/05/16 04:17:18 tcp.newSender called with mss and opt size 65495, 0
2024/05/16 04:17:18 tcp.sender.updateMaxPayloadSize: mtu (1480), count (0), s.MaxPayloadSize (65495)
2024/05/16 04:17:18 tcp.newSender called with mss and opt size 65495, 0
2024/05/16 04:17:18 tcp.sender.updateMaxPayloadSize: mtu (1480), count (0), s.MaxPayloadSize (65495)
2024/05/16 04:17:47 tcp.newSender called with mss and opt size 65495, 0
2024/05/16 04:17:47 tcp.sender.updateMaxPayloadSize: mtu (1480), count (0), s.MaxPayloadSize (65495)
2024/05/16 04:17:47 tcp.newSender called with mss and opt size 65495, 0
2024/05/16 04:17:47 tcp.sender.updateMaxPayloadSize: mtu (1480), count (0), s.MaxPayloadSize (65495)

iperf out: iperf3.exe -c 192.168.2.5 -R

Connecting to host 192.168.2.5, port 5201
Reverse mode, remote host 192.168.2.5 is sending
[  4] local 192.168.200.200 port 61607 connected to 192.168.2.5 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   175 MBytes  1.46 Gbits/sec
[  4]   1.00-2.00   sec   169 MBytes  1.42 Gbits/sec
[  4]   2.00-3.00   sec   171 MBytes  1.43 Gbits/sec
[  4]   3.00-4.00   sec   172 MBytes  1.44 Gbits/sec
[  4]   4.00-5.00   sec   162 MBytes  1.36 Gbits/sec
[  4]   5.00-6.00   sec   167 MBytes  1.40 Gbits/sec
[  4]   6.00-7.00   sec   168 MBytes  1.41 Gbits/sec
[  4]   7.00-8.00   sec   165 MBytes  1.38 Gbits/sec
[  4]   8.00-9.00   sec   166 MBytes  1.40 Gbits/sec
[  4]   9.00-10.00  sec   164 MBytes  1.38 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  1.65 GBytes  1.42 Gbits/sec                  sender
[  4]   0.00-10.00  sec  1.64 GBytes  1.41 Gbits/sec                  receiver

ipferf raw speed out :

Connecting to host 192.168.2.5, port 5201
Reverse mode, remote host 192.168.2.5 is sending
[  4] local 192.168.2.5 port 61385 connected to 192.168.2.5 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  1.13 GBytes  9.73 Gbits/sec
[  4]   1.00-2.00   sec   881 MBytes  7.39 Gbits/sec
[  4]   2.00-3.00   sec  1.28 GBytes  11.0 Gbits/sec
[  4]   3.00-4.00   sec   818 MBytes  6.86 Gbits/sec
[  4]   4.00-5.01   sec  1.34 GBytes  11.4 Gbits/sec
[  4]   5.01-6.00   sec  1.07 GBytes  9.22 Gbits/sec
[  4]   6.00-7.00   sec  1022 MBytes  8.60 Gbits/sec
[  4]   7.00-8.00   sec   688 MBytes  5.77 Gbits/sec
[  4]   8.00-9.00   sec  1.09 GBytes  9.41 Gbits/sec
[  4]   9.00-10.00  sec  1.02 GBytes  8.73 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec  10.3 GBytes  8.81 Gbits/sec                  sender
[  4]   0.00-10.00  sec  10.3 GBytes  8.81 Gbits/sec                  receiver

380wmda999 avatar May 15 '24 20:05 380wmda999