Deadline Bounded StreamSend
Currently, there is no API to request MsQuic to be able to request that a particular StreamSend should either be fulfilled within a deadline or not be sent all together.
Why is this useful? Real time and low latency applications which also send large amounts of data such as live video streaming and interactive gaming, These applications often generate blocks of data periodically and need the block to be delivered before the next time slot (read as deadline).
In low latency use cases especially when there are variable network conditions (prevalent in the modern landscape of wireless and mobile) it can be extremely detrimental to latency if we queue in a block which can not be sent within the deadline on to the wire as it blocks that send of other data until it is drained (because we do not have stream abort/reset or stream priorities below the transport layer)
The feature request can be broken into 2 major parts
- We need some logic at the transport layer to make sure that MsQuic checks if the data can be sent within the deadline.
- An API for the send side application to set the deadline and receive a callback when the it has been guessed that the data will not be sent as the deadline is likely not going to be satisfied.
This feature request does not propose
- Any negotiation of deadlines between sender and receiver, we assume that this is done at the application layer and the deadline is enforced only by the sender for latency reasons.
- Any explicit signal to the receiver that a block of data will not be sent because of the deadline might not be fulfilled. This is simply because I do not wish to formally define what the signal should be. There has been some work in this area. Read, please note that this is meant for MP-QUIC but there is no reason we should not have such a feature. There is also similar work for QUIC but it seems to have been abandoned, read.
Some alternatives which might seem like a good alternative but I believe are not
- Have a timer at the application layer and abort the stream: This works only if the data from the stream has not been queued into the wire by the time abort occurs, when we have shorter deadlines such as ~100ms which would be expected in low latency and real time applications we also have much smaller blocks which are going to be less than the congestion window and readily queued onto the wire almost instantaneously.
- Use a unreliable transport protocol: The fact that we are dropping blocks of data when they can not be sent might sound like we do not need reliability, but I would argue that is not the case. Consider the use of SVC (Scalable Video Codec) where we encode a raw video stream into hierarchical base and enhancement layers. Assume we have 4 layers in total, if we receive only Layer 1 before the deadline, we decode it and present a 480p video to the user, if we receive only Layer 1 and Layer 2 before the deadline, we decode them together and present a 720p video and so on. For a lack of a better term in my vocabulary I am going to term it as "tolerant to unreliability at the block level", the application is okay with receiving 2 out of the 5 layers, but not okay with receiving a corrupted layer which would be possible in case of using datagrams (assume each block few MTUs in size)
Proposed solution
Before diving into a proposed solution, I would like to clarify that is solution is for deadlines for each StreamSend call, and not for the stream all together. The stream can live for eternity but we enforce a condition on each StreamSend that it must be sent within the deadline. Once we have a solution which enforces a deadline on each StreamSend call, it becomes easier to have a deadline for a stream all together by enforcing the deadline on all StreamSend calls on that Stream.
We also decide to either send the data in entirety or not at all, we will not be sending the data partially.
API for sender application
Introduce a new API in QUIC_API_TABLE named StreamSendWithDeadline and the following signature
typedef
_IRQL_requires_max_(DISPATCH_LEVEL)
QUIC_STATUS
(QUIC_API * QUIC_STREAM_SEND_WITH_DEADLINE_FN)(
_In_ _Pre_defensive_ HQUIC Stream,
_In_reads_(BufferCount) _Pre_defensive_
const QUIC_BUFFER* const Buffers,
_In_ uint32_t BufferCount,
_In_ QUIC_SEND_FLAGS Flags,
_In_opt_ void* ClientSendContext,
_In_ TimeDiff TimeToDeadlineInMilliseconds, // New Field
_In_ QUIC_DEADLINE_SEND_FLAGS DeadlineSendFlags // New Field
);
If the current time is T0, this API sets deadline as T0 + TimeToDeadlineInMilliseconds, TimeDiff would be equivalent to decltype(operator-(std::chrono::time_point, std::chrono::time_point), we can instead use size_t or time_t
What happens if MsQuic guesses that the block can not be sent within the deadline?
This would depend on the DeadlineSendFlags set
typedef enum QUIC_DEADLINE_SEND_FLAGS {
QUIC_DEADLINE_SEND_FLAG_CALLBACK, // Default
QUIC_DEADLINE_SEND_FLAG_SILENT,
} QUIC_DEADLINE_SEND_FLAGS;
In case of QUIC_DEADLINE_SEND_FLAG_CALLBACK,
There is a QUIC_STREAM_EVENT callback sent with the following type QUIC_STREAM_EVENT_DEADLINE_POSSIBLY_CAN_NOT_BE_SATISFIED (please suggest a better name) with the following event type
struct {
void* ClientSendContext;
TimePoint (*GetCurrentMsTime)(void);
TimeDiff HowManyMsBeforeTheReceiverReceivesTheData
} DEADLINE_POSSIBLY_CAN_NOT_BE_SATISFIED;
If the application returns QUIC_STATUS_SUCCESS from the callback MsQuic decides to not send the data
If the application returns QUIC_STATUS_CONTINUE from the callback MsQuic decides to continue with sending the data even if the deadline is going to be broken.
Calculating HowManyMsBeforeTheReceiverReceivesTheData
This assumes we have a very good Bandwidth and RTT estimate and computes it using the following simple hueristic
If there exists a TimeToDeadlineInMilliseconds then at the first time a frame is written:
ExpectedDelay = (RTT / 2) + (BytesInFlight + DataSize) / ExpectedBandwidth
// NOTE: BytesInFlight includes frames which are queued to be yeeted into flight
if (CurrentTime + ExpectedDelay > Deadline):
RetVal = IndicateEvent()
If (RetVal == QUIC_STATUS_SUCCESS):
Dequeue the buffer from the `SendRequests` list
TimeToDeadlineInMilliseconds = std::nullopt
Additional context
No response
A few comments:
An API for the send side application to set the deadline and receive a callback when the it has been guessed that the data will not be sent as the deadline is likely not going to be satisfied.
How is MsQuic supposed to "guess"? Is this deadline based on just putting the data on the wire, or based on an expectation of the one-way delay to the peer, or the round trip time for an ACK? What about loss/retransmits? What about being blocked on flow control or congestion control? Just because we're blocked now doesn't mean things couldn't open in the future.
I think this feature is a good ask, but we must be very crisp on the behavior here. Worst case, MsQuic can/should always come back if we run out of time and give a callback to the app for this.
I would like to clarify that is solution is for deadlines for each StreamSend call, and not for the stream all together
You can't cancel a single send, only the stream. So, if we went with this model, if a single send doesn't meet the deadline, the stream would be shutdown from there on an you'd have to use a new one. Is that what you're looking for (I assume not)?
typedef enum QUIC_DEADLINE_SEND_FLAGS { QUIC_DEADLINE_SEND_FLAG_CALLBACK, // Default QUIC_DEADLINE_SEND_FLAG_SILENT, } QUIC_DEADLINE_SEND_FLAGS;
You must have the callback, because to shut down a stream, an error code must be provided. The error code would be the mechanism for the app to provide this.
This assumes we have a very good Bandwidth and RTT estimate
Do you really want to use RTT, or do you want to use a one-way delay (not actually standardized today).
Let me just add some definitions to make conversations less confusing
DeadlineTimePoint: Server side epoch time before which we want the send to be completed
DeadlineTimeDiff: How many ms to deadline as set by the TimeToDeadlineInMilliseconds parameter in the StreamSendWithDeadline function
How is MsQuic supposed to "guess"?
A very naive idea is to use ExpectedDelay = (RTT / 2) + (BytesInFlight + DataSize) / ExpectedBandwidth and check if DeadlineTimePoint is before CurrentTime + ExpectedDelay
Is this deadline based on just putting the data on the wire, or based on an expectation of the one-way delay to the peer
T0: Application calls StreamSendWithDeadline with DeadlineTimeDiff
T0 + DeadlineTimeDiff = DeadlineTimePoint
I believe, we need to include one way delay and bandwidth estimate into the calculation of the "guess"
or the round trip time for an ACK?
I prefer full block being received by the client as our "satisfaction condition" (the event which we want to occur before DeadlineTimePoint) and not server receiving ACK, to make a good guess we want to minimize variance in what we are measuring. T1: Block is fully received for client T2: All data frames which have transmitted the block have been ACKd
I believe having a reliable guess for T1 is easier than T2 because the client might decide to wait to accumulate multiple ACKs
What about loss/retransmits?
This is a tough problem I have been really hoping that the Bandwidth estimate from the CongestionControl protocol will just magically solve (which is probably too much to hope for). Unfortunately I have to admit that I have no concrete solution for this yet, except for possibly using full block being ACKd as the satisfaction condition which would require us to estimate T2 which as mentioned previously isn't something I am in favour of.
What about being blocked on flow control or congestion control? Just because we're blocked now doesn't mean things couldn't open in the future.
I hadn't thought about this, does something like a TryToSendLater flag work and we remove the send request from the queue only after DeadlineTimePoint
You can't cancel a single send, only the stream. So, if we went with this model, if a single send doesn't meet the deadline, the stream would be shutdown from there on an you'd have to use a new one. Is that what you're looking for (I assume not)?
I like this feature to be as simple and unsurprising as possible if we were resetting the stream, what happens to the other objects queued to be sent? We will need to return them to the application who will probably create another stream and queue the objects onto it which sounds like just an unnecessary amount of effort which I would like to prevent by dequeuing the send request. MsQuic trying to create a new stream without an explicit instruction from the application isn't something I am fond of either.
You must have the callback, because to shut down a stream, an error code must be provided. The error code would be the mechanism for the app to provide this.
If this is a critique on there existing an QUIC_DEADLINE_SEND_FLAG_SILENT, I would like to defend its existence. I believe MsQuic should not be calling callbacks for events the application is not interested in. Ideally I would like to see a "subscribe" (something akin to how we do the network statistics callback) for events, on account of the fact that we do not have such an API I am suggesting QUIC_DEADLINE_SEND_FLAG_[SILENT, CALLBACK] to show interest in QUIC_STREAM_EVENT_DEADLINE_POSSIBLY_CAN_NOT_BE_SATISFIED events.
Do you really want to use RTT, or do you want to use a one-way delay (not actually standardized today).
Path->OneWayDelay or Path->OneWayDelayLatest would be what I need
To add, this feature request is going to be a good amount of research work on trying to get a good guess and analysing which methodology provides us a good guess can not really be done unless we actually implement and test different guessing models.
I have started trying to implement this and realised a couple issues Discussing the issue and my solution thought process over here
Streams are byte-ordered
In the previous comments I had ignorantly suggested breaking the byte-ordered property of streams that is Consider object1 (100ms), object2 (no deadline) being queued onto the stream
My ideas were to either
- cancel object1 if the deadline can not be satisfied:
- send object2 and wait untill the deadline to see if the network conditions
Both of these are bad in my opinion, I believe @nibanks pointed this out when he said only a stream can be cancelled and not the objects queued into the send but I didn't realise this initially.
I still want to go with (idea 1) and provide an API where a deadline can be set for the whole stream and the contents queued into it this would allow the stream to be abruptly reset at any point if it is ascertained that the send can not be continued. I am not confident an ideal API for setting a deadline on the stream
I like the idea of SetParam(), but am requesting some consensus on it as it is not atomic with StreamOpen()
Consider the following scenario StreamOpen -> StreamSend -> Many PacketsQueued -> SetParam
Alternatively we can introduce another API DeadlineStreamOpen but that just keeps adding more and more "Deadline" APIs
A more fundamental question? Why not use a new stream for each send? Then the whole thing works or the whole stream is cancelled.
A more fundamental question? Why not use a new stream for each send? Then the whole thing works or the whole stream is cancelled.
That basically requires MsQuic to say, StreamSendWithDeadline should always be called with QUIC_SEND_FLAG_FIN
I do not like that idea
Consider the scenario where you need to forward a very long river of data, should the relay wait for all the data to be received before forwarding it? It'd be much better for it to forward data in chunks of say 1MB as it receives it.
Publisher -> Relay -> Subscriber
Should we use different streams for each 1MB chunk?
- Receiver needs to synchronize all the streams and sort the data
- Relay needs to create a new stream for each chunk and package data into 1MB chunks send it to MsQuic which again is going to package it into frames/packets
In this case the relay might want to open stream using DeadlineStreamOpen and just keep queuing data onto the stream as it receives it and force the stream to be closed if it believes the data can't be sent.
Hi @Johan511, thanks for all the details and the work on the PR.
I think I am not convinced either that this is a feature that should be added to MsQuic though. It breaks the typical contract of a reliable transport, where data queued to be sent is reliably transferred (or the stream/connection is terminated), for what seems a very specific use case.
To me, this sounds more like application layer logic: the same way MsQuic should "guess" if the data will make it in time through the OS transport stack + the network, the application could "guess" if the data will make it in time to the peer.
The MsQuic level guess is also based on a transport to transport delay, while it sounds like what is actually needed is a app-to-app delay.
I think that implementing this logic in the application (potentially extending MsQuic to expose statistics needed to compute the delays) would be a better and more generic solution.
Hi @guhetier, I understand your concerns about the feature and concerns about it's maintenance and agree the admins should have the final say in what gets merged into the library.
This was a specific issue I faced while implementing MOQT (in particular the DELIVERY_TIMEOUT feature) and haven't seen any solutions for anywhere else. I am perfectly content with merging it into my fork of MsQuic and continue using it.
However, I would like to make some arguments.
It breaks the typical contract of a reliable transport, where data queued to be sent is reliably transferred (or the stream/connection is terminated), for what seems a very specific use case.
This feature is not very different from CANCEL_ON_LOSS, most of the implementation and testing code was written taking CANCEL_ON_LOSS as an example.
In both cases
- we break the reliable transport contract
- there's an event which causes the stream to be reset
To me, this sounds more like application layer logic: the same way MsQuic should "guess" if the data will make it in time through the OS transport stack + the network, the application could "guess" if the data will make it in time to the peer.
Initially I was against the idea of having MsQuic make the "guess" because I believed the application would have more knowledge of the overall network scenario and might be able to make a better guess. But when I started implementing this at an application layer I realised I was incorrect.
I already have a hacky commit on my local fork for this where the application does this heuristic check when the first byte from the stream is being written into a frame. The problem with this approach is for the heuristic to be very accurate we need more information of the internal congestion control mechanism. For example, in the PR you can see we use information of the BBR state, I am also experimenting with using the pacing control parameters to get a more accurate heuristic.
Exposing all this information to the application doesn't seem wise in my opinion as it might restrict the changes we can make in the future.
The MsQuic level guess is also based on a transport to transport delay, while it sounds like what is actually needed is a app-to-app delay.
I am using SmoothedRtt provided by QuicCongestionControlGetNetworkStatistics which is derived form AckDelay (from frame header) and PacketRtt defined as
uint64_t PacketRtt = CxPlatTimeDiff64(PacketMeta->SentTime, TimeNow);
To me that looks like a good estimate for app to app delay due to propagation, queuing, and processing components. For the Transmission delay component we are using the Bandwidth measurement which is quite accurate in case of BBR.
I think that implementing this logic in the application (potentially extending MsQuic to expose statistics needed to compute the delays) would be a better and more generic solution.
This has been a feature request since a couple months and I have brain storming for a generic solution, but whichever generic solution we do provide will lead to
- More stream level callbacks such as the one discussed here
- possibly exposing internal congestion control information such as BBR state, pacing rate etc.,
Taking the liberty to quote @nibanks
My preference here would be to not expose this, and instead implement the deadline scheduling logic inside MsQuic. This exposes too much internals IMO which can cause perf issues (very frequent callbacks) as well as fragility (what if we offload this layer to HW in the future?).
QUIC provides a very cheap way of creating and destroying streams, but what is lacks it reliable aborts for low latency applications which would use small objects. This is not a critique of QUIC but a simple fact I am pointing out. Small objects are very likely going to be dumped into wire when congestion window is large and given that for obvious performance reasons congestion windows are often much larger than BDP dumping these objects into wire only means they block future objects leading to significant increases in delay if we later decide that some of the objects are not necessary and need to be aborted (because we can't yank data from out of the wire).
Inserting a latency graph to show the severity of the issue of dumping objects into wire which the application later decides to abort. We expect latencies of the order of 100ms, but it ballooned to 10 seconds
Thanks for pointing me to that earlier PR, I wasn't aware the discussion started that way :D I'll take a closer look keep that extra info in mind, feel free to ignore my earlier comment if that approach was already rejected.
For reference, this seems loosely related to: https://datatracker.ietf.org/doc/html/draft-tjohn-quic-multipath-dmtp-01
While this focuses on multipath, it has a subset on functionalities for single path QUIC.