quinn
quinn copied to clipboard
quinn stream is 10x slower then TCP/UDP, what could be wrong?
I have a 5G network with a download limit of about 100Mb/s and a server with a public IP, I noticed that the download speed of the QUIC connection is much lower than I expected, just around 4-5 Mb/s, so I made a test:
- Create a 100MB file on the server.
- Compare the download speed of
tokio TCPStream,tokio UDPSocket,quinn SendStream/RecvStream.
The result:
- Tokio TCPStream
read: 104788840 bytes in 15 seconds, speed: 6985922 bytes/s
- Tokio UDPSocket
read: 104096000 bytes in 13 seconds, speed: 8007384 bytes/s
- quinn
read: 103257760 bytes in 186 seconds, speed: 555149 bytes/s
Part of the program:
struct RawQuicServer {
arg: ServerArg,
}
impl RawQuicServer {
async fn run(&mut self) -> anyhow::Result<()> {
println!("listening at: {}", self.arg.port);
let addr = format!("0.0.0.0:{}", self.arg.port);
let mut acceptor = MyQAcceptor::new(&addr)?;
loop {
let conn = acceptor.accept().await?;
println!("got connection from: {}", conn.remote_address());
self.serve(conn).await?;
}
}
async fn serve(&mut self, conn: quinn::Connection) -> anyhow::Result<()> {
let mut buf = vec![0u8; 1400];
let (mut tx, _rx) = conn.accept_bi().await?;
let mut file = File::open(&self.arg.file).await?;
while let Ok(n) = file.read(&mut buf).await {
if n == 0 {
break;
}
tx.write(&buf[..n]).await.context("write failed")?;
}
println!("finished");
Ok(())
}
}
struct RawQuicClient {
arg: ClientArg,
}
impl RawQuicClient {
async fn run(&mut self) -> anyhow::Result<()> {
let server = format!("{}:{}", self.arg.server, self.arg.port);
let client_cfg = configure_client()?;
let mut endpoint = quinn::Endpoint::client("0.0.0.0:0".parse()?)?;
endpoint.set_default_client_config(client_cfg);
let addr = server.parse()?;
let conn = endpoint.connect(addr, "localhost")?.await?;
println!("connected");
let (mut tx, mut rx) = conn.open_bi().await?;
tx.write("knock".as_bytes()).await?;
let mut buf = vec![0u8; 1400];
let mut read = 0;
let start = Instant::now();
let mut last = Instant::now();
let mut tick = interval(Duration::from_millis(10));
loop {
tokio::select! {
result = rx.read(&mut buf) => {
let result = match result {
Ok(x) => x,
Err(_) => break,
};
let n = match result {
Some(x) => x,
None => break,
};
if n == 0 {
println!("read finished");
break;
}
read += n;
},
_ = tick.tick() => {
if last.elapsed() >= Duration::from_secs(2) {
println!("read: {read} bytes");
last = Instant::now();
}
}
}
}
let elapsed = start.elapsed().as_secs();
if elapsed == 0 {
panic!("fail too soon");
}
println!(
"read: {} bytes in {} seconds, speed: {}",
read,
elapsed,
read as u64 / elapsed
);
Ok(())
}
}
Tokio TCP/UDP code is more or less the same.
A few notes:
- I have added some flow control to the UDP sender to ensure it matches the bandwidth.
- The test program is built in release mode.
- All the TCP/UDP/QUIC code is in one .rs file.
- ipfer3 on client
iperf3 -c SERVER_ADDRESS -i 1 -t 10 -b 90m -u -R
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-1.00 sec 10.7 MBytes 90.1 Mbits/sec 0.152 ms 4/8120 (0.049%)
[ 5] 1.00-2.00 sec 8.51 MBytes 71.4 Mbits/sec 0.117 ms 0/6431 (0%)
[ 5] 2.00-3.00 sec 12.9 MBytes 108 Mbits/sec 0.147 ms 0/9758 (0%)
[ 5] 3.00-4.00 sec 10.8 MBytes 90.3 Mbits/sec 0.124 ms 0/8129 (0%)
[ 5] 4.00-5.00 sec 10.7 MBytes 89.9 Mbits/sec 0.118 ms 13/8105 (0.16%)
[ 5] 5.00-6.00 sec 10.7 MBytes 89.5 Mbits/sec 0.123 ms 0/8062 (0%)
[ 5] 6.00-7.00 sec 10.8 MBytes 90.5 Mbits/sec 0.114 ms 0/8149 (0%)
[ 5] 7.00-8.00 sec 10.6 MBytes 89.2 Mbits/sec 0.138 ms 60/8097 (0.74%)
[ 5] 8.00-9.00 sec 10.7 MBytes 90.0 Mbits/sec 0.134 ms 0/8102 (0%)
[ 5] 9.00-10.00 sec 10.7 MBytes 90.0 Mbits/sec 0.118 ms 8/8116 (0.099%)
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-10.04 sec 108 MBytes 90.0 Mbits/sec 0.000 ms 0/81410 (0%) sender
[ 5] 0.00-10.00 sec 107 MBytes 89.9 Mbits/sec 0.118 ms 85/81069 (0.1%) receiver
Quinn includes encryption via TLS, while TCP/UDP don't.
I tested it on another client (same hardware) with a wired connection, it can reach 6462908 bytes/s, a bit lower than TCP, which is reasonable.
Can a high RTT be the cause? For 5G it's about 100ms compared to 18ms for the wired connection, if so, can I increase quinn snd_buf/recv_buf?
Can a high RTT be the cause?
In theory, yes. See https://en.wikipedia.org/wiki/Bandwidth-delay_product.
Can a high RTT be the cause?
In theory, yes. See https://en.wikipedia.org/wiki/Bandwidth-delay_product.
Thanks for the link, it seems that TCP can increase window scale, can QUIC do that?
The various window and buffer_size parameters to TransportConfig may be of interest. The default settings are specifically designed to be adequate for 100Mbps at 100ms, so if you do find some settings that work better, please let us know!
Another possibility to explore is packet loss.
How to observe packet loss through quinn?
----EDIT 1----
Thers' ConnectionStats
I have increased the window size and buffer size (both on the server and client), but nothing good:
fn set_transport_config(config: &mut TransportConfig) {
// STREAM_RWND: 12500 * 100
// stream_receive_window: STREAM_RWND
// send_window: 8 * STREAM_RWND
// crypto_buffer_size: 16 * 1024
let stream_rwnd = 12500 * 100;
// increase wnd *2
let stream_rwnd = stream_rwnd * 2;
let crypto_buffer_size = 16 * 1024;
// increase *2
let crypto_buffer_size = crypto_buffer_size * 2;
config.stream_receive_window((stream_rwnd as u32).into());
config.send_window(8 * stream_rwnd);
config.crypto_buffer_size(crypto_buffer_size);
}
lost packets on the client are quite low (print interval is 2 seconds):
read: 37927414 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 38407326 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 38779318 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 39090255 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 39523313 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 39900975 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 40196299 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 40476012 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 40768502 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 41289600 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 41898718 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 42361589 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 42570305 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 42922423 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 43363415 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 43806485 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 44256564 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 44702395 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 45058779 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 45454929 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 46045595 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 46507047 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 47082105 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 47803392 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 48538892 bytes, lost_packets: 3, lost_plpmtud_probes: 0
read: 49142337 bytes, lost_packets: 3, lost_plpmtud_probes: 0
The test process consumes about 15% of CPU.
You need to configure connection-level window sizes, not just stream-level. The smaller of the two limits applies.
You need to configure connection-level window sizes, not just stream-level. The smaller of the two limits applies.
What do you mean by stream-level? I didn't see any API related to window/buffer settings inside SendStream and RecvStream.
I set TransportConfig to ServerConfig and ClientConfig before the connection was made.
Sorry, I think you mean TransportConfig.receive_window.
Because it is initialized to VarInt::MAX so I ignored that part.
Ah, right, that should be fine then, assuming you're installing the config correctly. Next step would be to investigate what precisely it's waiting for. I don't have the bandwidth to do this personally right now, but you could approach the problem by investigating a decrypted packet capture, and/or digging into quinn_proto::Connection::poll_transmit and poll_timeout to see what's preventing data from being transmitted more often.