quinn icon indicating copy to clipboard operation
quinn copied to clipboard

quinn stream is 10x slower then TCP/UDP, what could be wrong?

Open tubzby opened this issue 9 months ago • 11 comments

I have a 5G network with a download limit of about 100Mb/s and a server with a public IP, I noticed that the download speed of the QUIC connection is much lower than I expected, just around 4-5 Mb/s, so I made a test:

  1. Create a 100MB file on the server.
  2. Compare the download speed of tokio TCPStream, tokio UDPSocket, quinn SendStream/RecvStream.

The result:

  1. Tokio TCPStream
read: 104788840 bytes in 15 seconds, speed: 6985922 bytes/s
  1. Tokio UDPSocket
read: 104096000 bytes in 13 seconds, speed: 8007384 bytes/s
  1. quinn
read: 103257760 bytes in 186 seconds, speed: 555149 bytes/s

Part of the program:

struct RawQuicServer {
    arg: ServerArg,
}

impl RawQuicServer {
    async fn run(&mut self) -> anyhow::Result<()> {
        println!("listening at: {}", self.arg.port);

        let addr = format!("0.0.0.0:{}", self.arg.port);
        let mut acceptor = MyQAcceptor::new(&addr)?;
        loop {
            let conn = acceptor.accept().await?;
            println!("got connection from: {}", conn.remote_address());
            self.serve(conn).await?;
        }
    }

    async fn serve(&mut self, conn: quinn::Connection) -> anyhow::Result<()> {
        let mut buf = vec![0u8; 1400];

        let (mut tx, _rx) = conn.accept_bi().await?;
        let mut file = File::open(&self.arg.file).await?;
        while let Ok(n) = file.read(&mut buf).await {
            if n == 0 {
                break;
            }
            tx.write(&buf[..n]).await.context("write failed")?;
        }
        println!("finished");
        Ok(())
    }
}

struct RawQuicClient {
    arg: ClientArg,
}

impl RawQuicClient {
    async fn run(&mut self) -> anyhow::Result<()> {
        let server = format!("{}:{}", self.arg.server, self.arg.port);

        let client_cfg = configure_client()?;
        let mut endpoint = quinn::Endpoint::client("0.0.0.0:0".parse()?)?;
        endpoint.set_default_client_config(client_cfg);

        let addr = server.parse()?;
        let conn = endpoint.connect(addr, "localhost")?.await?;

        println!("connected");

        let (mut tx, mut rx) = conn.open_bi().await?;
        tx.write("knock".as_bytes()).await?;

        let mut buf = vec![0u8; 1400];
        let mut read = 0;
        let start = Instant::now();
        let mut last = Instant::now();

        let mut tick = interval(Duration::from_millis(10));
        loop {
            tokio::select! {
                result = rx.read(&mut buf) => {
                    let result = match result {
                        Ok(x) => x,
                        Err(_) => break,
                    };
                    let n = match result {
                        Some(x) => x,
                        None => break,
                    };
                    if n == 0 {
                        println!("read finished");
                        break;
                    }
                    read += n;
                },
                    _ = tick.tick() => {
                    if last.elapsed() >= Duration::from_secs(2) {
                        println!("read: {read} bytes");
                        last = Instant::now();
                    }
                }
            }
        }

        let elapsed = start.elapsed().as_secs();
        if elapsed == 0 {
            panic!("fail too soon");
        }
        println!(
            "read: {} bytes in {} seconds, speed: {}",
            read,
            elapsed,
            read as u64 / elapsed
        );
        Ok(())
    }
}

Tokio TCP/UDP code is more or less the same.

A few notes:

  1. I have added some flow control to the UDP sender to ensure it matches the bandwidth.
  2. The test program is built in release mode.
  3. All the TCP/UDP/QUIC code is in one .rs file.
  4. ipfer3 on client
iperf3 -c SERVER_ADDRESS -i 1 -t 10 -b 90m -u -R
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-1.00   sec  10.7 MBytes  90.1 Mbits/sec  0.152 ms  4/8120 (0.049%)
[  5]   1.00-2.00   sec  8.51 MBytes  71.4 Mbits/sec  0.117 ms  0/6431 (0%)
[  5]   2.00-3.00   sec  12.9 MBytes   108 Mbits/sec  0.147 ms  0/9758 (0%)
[  5]   3.00-4.00   sec  10.8 MBytes  90.3 Mbits/sec  0.124 ms  0/8129 (0%)
[  5]   4.00-5.00   sec  10.7 MBytes  89.9 Mbits/sec  0.118 ms  13/8105 (0.16%)
[  5]   5.00-6.00   sec  10.7 MBytes  89.5 Mbits/sec  0.123 ms  0/8062 (0%)
[  5]   6.00-7.00   sec  10.8 MBytes  90.5 Mbits/sec  0.114 ms  0/8149 (0%)
[  5]   7.00-8.00   sec  10.6 MBytes  89.2 Mbits/sec  0.138 ms  60/8097 (0.74%)
[  5]   8.00-9.00   sec  10.7 MBytes  90.0 Mbits/sec  0.134 ms  0/8102 (0%)
[  5]   9.00-10.00  sec  10.7 MBytes  90.0 Mbits/sec  0.118 ms  8/8116 (0.099%)
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.04  sec   108 MBytes  90.0 Mbits/sec  0.000 ms  0/81410 (0%)  sender
[  5]   0.00-10.00  sec   107 MBytes  89.9 Mbits/sec  0.118 ms  85/81069 (0.1%)  receiver

tubzby avatar Feb 12 '25 09:02 tubzby

Quinn includes encryption via TLS, while TCP/UDP don't.

djc avatar Feb 12 '25 12:02 djc

I tested it on another client (same hardware) with a wired connection, it can reach 6462908 bytes/s, a bit lower than TCP, which is reasonable.

Can a high RTT be the cause? For 5G it's about 100ms compared to 18ms for the wired connection, if so, can I increase quinn snd_buf/recv_buf?

tubzby avatar Feb 13 '25 02:02 tubzby

Can a high RTT be the cause?

In theory, yes. See https://en.wikipedia.org/wiki/Bandwidth-delay_product.

thomaseizinger avatar Feb 13 '25 03:02 thomaseizinger

Can a high RTT be the cause?

In theory, yes. See https://en.wikipedia.org/wiki/Bandwidth-delay_product.

Thanks for the link, it seems that TCP can increase window scale, can QUIC do that?

tubzby avatar Feb 13 '25 03:02 tubzby

The various window and buffer_size parameters to TransportConfig may be of interest. The default settings are specifically designed to be adequate for 100Mbps at 100ms, so if you do find some settings that work better, please let us know!

Another possibility to explore is packet loss.

Ralith avatar Feb 13 '25 22:02 Ralith

How to observe packet loss through quinn?

----EDIT 1----

Thers' ConnectionStats

tubzby avatar Feb 13 '25 23:02 tubzby

I have increased the window size and buffer size (both on the server and client), but nothing good:

fn set_transport_config(config: &mut TransportConfig) {
    // STREAM_RWND: 12500 * 100
    // stream_receive_window: STREAM_RWND
    // send_window: 8 * STREAM_RWND
    // crypto_buffer_size: 16 * 1024

    let stream_rwnd = 12500 * 100;
    // increase wnd *2
    let stream_rwnd = stream_rwnd * 2;
    let crypto_buffer_size = 16 * 1024;
    // increase *2
    let crypto_buffer_size = crypto_buffer_size * 2;

    config.stream_receive_window((stream_rwnd as u32).into());
    config.send_window(8 * stream_rwnd);
    config.crypto_buffer_size(crypto_buffer_size);
}

lost packets on the client are quite low (print interval is 2 seconds):

read: 37927414 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 38407326 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 38779318 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 39090255 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 39523313 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 39900975 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 40196299 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 40476012 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 40768502 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 41289600 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 41898718 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 42361589 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 42570305 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 42922423 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 43363415 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 43806485 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 44256564 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 44702395 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 45058779 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 45454929 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 46045595 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 46507047 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 47082105 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 47803392 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 48538892 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 49142337 bytes, lost_packets: 3, lost_plpmtud_probes: 0

The test process consumes about 15% of CPU.

tubzby avatar Feb 14 '25 01:02 tubzby

You need to configure connection-level window sizes, not just stream-level. The smaller of the two limits applies.

Ralith avatar Feb 14 '25 01:02 Ralith

You need to configure connection-level window sizes, not just stream-level. The smaller of the two limits applies.

What do you mean by stream-level? I didn't see any API related to window/buffer settings inside SendStream and RecvStream.

I set TransportConfig to ServerConfig and ClientConfig before the connection was made.

tubzby avatar Feb 14 '25 02:02 tubzby

Sorry, I think you mean TransportConfig.receive_window.

Because it is initialized to VarInt::MAX so I ignored that part.

tubzby avatar Feb 14 '25 02:02 tubzby

Ah, right, that should be fine then, assuming you're installing the config correctly. Next step would be to investigate what precisely it's waiting for. I don't have the bandwidth to do this personally right now, but you could approach the problem by investigating a decrypted packet capture, and/or digging into quinn_proto::Connection::poll_transmit and poll_timeout to see what's preventing data from being transmitted more often.

Ralith avatar Feb 14 '25 03:02 Ralith