neqo icon indicating copy to clipboard operation
neqo copied to clipboard

QUIC datagram outbound backpressure

Open mxinden opened this issue 4 months ago • 2 comments

Neqo currently has a default queue size for outbound QUIC datagrams of 10:

https://github.com/mozilla/neqo/blob/f1df4233212d9707572e799f7b912074e40b6290/neqo-transport/src/connection/params.rs#L42

When exceeding that queue size, neqo-transport drops at the head of the queue:

https://github.com/mozilla/neqo/blob/f1df4233212d9707572e799f7b912074e40b6290/neqo-transport/src/quic_datagrams.rs#L158-L167

The queue was introduced in https://github.com/mozilla/neqo/pull/1220.

While dropping is better than an infinite queue, it does produce unnecessary packet loss. A drop is a signal for the sender's congestion controller, but the queue might not actually be the path bottleneck, but instead the queue might simply not be able to handle a short burst. Given that this is all on the same machine, I don't think our 1ms pacer accuracy suffices to prevent such burst.

Instead of dropping at the head, I suggest we propagate backpressure to the sender. I see two options:

  1. Return the datagram to the sender when the queue is full, and notify when space is available so it can be retried.
  2. Allow the sender to query the current outbound datagram queue size.

Note that a sender could still choose to drop on their own. (Though arguably, that is a tail-drop and not a head-drop. Related: why do we currently do a head-drop?)

For WebTransport traffic, I assume propagating backpressure to the JavaScript context is too difficult.

For MASQUE connect-udp (https://github.com/mozilla/neqo/pull/2796) the backpressure allows us to prioritize the proxy connection over the proxied connection when a queue is building.

Related: we had the same issue in Firefox when sending UDP (not QUIC) datagrams. When the UDP socket wasn't ready to send, we would drop the datagram, leading to spurious packet drop and thus unnecessary congestion control events. See https://phabricator.services.mozilla.com/D239162 for details.

mxinden avatar Aug 13 '25 16:08 mxinden

//CC @KershawChang since we discussed this yesterday. For now, for MASQUE, we can rely on the proxied connection to re-transmit when the proxy connection drops a datagram. Long term, I don't think we can achieve high throughput without some way to prevent loss on bursts (e.g. through option (1) or (2) above).

mxinden avatar Aug 13 '25 16:08 mxinden

Ten seems short anyway. I think we want at least as much as will fit into a GSO batch. Agree that forwarding the backpressure signal is the right thing to do.

larseggert avatar Aug 14 '25 09:08 larseggert

For WebTransport traffic, I assume propagating backpressure to the JavaScript context is too difficult.

This is part of the latest W3C spec via a DatagramWritable.

mxinden avatar Nov 27 '25 14:11 mxinden