quinn
quinn copied to clipboard
Vectored I/O with datagrams
I'm adding datagram support to webtransport-quinn. WebTransport uses a session_id VarInt in front of each datagram to disambiguate sessions.
My plan goal is to mirror the Quinn API but have the session_id prefix transparent to the application, something like:
pub async fn send_datagram(&self, data: Bytes) -> Result<(), SessionError> {
// Unfortunately, we need to allocate/copy each datagram because of the Quinn API.
let mut buf = BytesMut::with_capacity(self.header_datagram.len() + data.len());
buf.copy_from_slice(&self.header_datagram); // pre-encoded VarInt
buf[self.header_datagram.len()..].copy_from_slice(&data);
self.conn.send_datagram(buf.into())?;
Ok(())
}
Unfortunately, with the current Quinn API it's impossible to avoid allocating a copy of every datagram.
Bytesmust be contiguous in memory.Bytesmust be on the heap, so I can't make a copy on the stack.Bytesreferences are held, so I can't reuse the sameBytesMutto avoid allocating each time.
The core issue is that there's no vectored IO for datagrams. This wouldn't be a problem with the stream API because you can call write multiple times, but it's a problem for the datagram API because you need the entire payload up front.
I'm not sure allocating a separate Bytes for a VarInt tag will actually improve performance vs. copying, since datagrams have to be fairly small to begin with. Do you have any data?
I'm not sure allocating a separate
Bytesfor aVarInttag will actually improve performance vs. copying, since datagrams have to be fairly small to begin with. Do you have any data?
Sorry, I updated the description with some more information and less of a proposal.
The issue is that I need to allocate a copy of each datagram since there's no way to prepend/append otherwise. I don't have any performance numbers but this would undoubtedly be an issue at high throughput.
I think this is already supported with the current API? A Bytes is:
A cheaply cloneable and sliceable chunk of contiguous memory.
That is, it should be possible to allocate a single Bytes for holding all your datagrams, then split it up and pass each part (which is really a view into the same underlying allocation) into Datagrams::send().
I think this is already supported with the current API? A
Bytesis:A cheaply cloneable and sliceable chunk of contiguous memory.
That is, it should be possible to allocate a single
Bytesfor holding all your datagrams, then split it up and pass each part (which is really a view into the same underlying allocation) intoDatagrams::send().
Not quite the same issue. The key thing is that I'm writing a library, so I don't actually create the datagram payload. The application creates the datagram payload and my library prefixes it with the WebTransport header before sending.
You're right that the application could encode the session_id header into each datagram, but I really don't want the application to know anything about WebTransport internals. It's meant to be transparent to the application so the WebTransport API is identical to QUIC.
A similar issue is fragmentation. Let's say I have a 1MB video Bytes blob that I want to fragment into datagrams. You're right that slicing is free, however that doesn't work since each datagram needs some sort of header (ex. sequence number), otherwise reassembly is impossible. You run into the same problem and need to allocate a copy of each datagram on the heap just to encode a small header.
Meanwhile, the streams API lets you avoid this. You can do two separate writes: one for the header and one for the Bytes payload. Or you can use a single write_all_chunks call.
Yeah, that generally makes sense. I'm curious how much the extra API complexity would actually buy us, though. Streams can have much larger individual writes, though I guess in either case you might be sending up to at most the link capacity, so maybe it's not so different...
I'm open to having this.
Yeah I don't think there's any urgency for this API. I'm not using datagrams and actively advocate against them.
But this would be a blocker for using datagrams at scale. Allocating on the heap for every packet gets expensive. I'm just going to leave a comment in my library and link to this issue.