Consider implementing Spurious Fast Retransmits for CUBIC
As discussed in #2535
From the RFC:
A QUIC implementation can easily determine a spurious fast retransmit if a QUIC packet is acknowledged after it has been marked as lost and the original data has been retransmitted with a new QUIC packet.
When packet loss is detected via acknowledgments, a CUBIC implementation MAY save the current value of the following variables before the congestion window is reduced.
[ ... ]
Once the previously declared packet loss is confirmed to be spurious, CUBIC MAY restore the original values of the above-mentioned variables [...]. Restoring the original values ensures that CUBIC's performance is similar to what it would be without spurious losses.
https://datatracker.ietf.org/doc/html/rfc9438#name-spurious-fast-retransmits
quiche implements it: https://github.com/cloudflare/quiche/blob/master/quiche/src/recovery/congestion/cubic.rs#L207C1-L228C6
so does picoquic: https://huitema.wordpress.com/2019/11/11/implementing-cubic-congestion-control-in-quic/#spurious
and see original discussion on including it in the draft: https://ntap.github.io/rfc8312bis/issues.html (cannot properly link, search for "spurious" on that page) ^ this also mentions Linux TCP implementing it partially, but I couldn't find it in the code on first glance.
Yes, we should do this. It should help a lot if there is packet reordering, which I think @KershawChang experienced in the Berlin office, so you should have a test case for it locally.
(I also find it interesting that NetApp deleted all my repos under github.com/ntap but somehow the associated web sites are still accessible!)
Alright, then I'll tackle this after (or during if it makes more sense, but as a separate PR ideally) #2535. And yes, kind of a shame that the repos don't exist anymore for viewing those discussions. Luckily the websites are still there, for whatever reason :)
I am in favor of this feature.
I assume this implementation introduces some non-negligible complexity into our congestion controller, as it may move in time, i.e. restore the pre-congestion-control-event state at any point. We will no longer be able to depend on the standard SlowStart -> CongestionAvoidance -> RecoveryStart -> Recovery -> CongestionAvoidance cycle.
May I suggest, before doing the actual implementation in neqo-transport/src/cc/ to land a metric counting the number of spurious re-transmits we see in the wild? We can then record the Stats::spurious_re_transmit / Stats::lost ratio in Glean.
https://github.com/mozilla/neqo/blob/c535de9c66b7f99a50e5da12888c324434358d23/neqo-transport/src/stats.rs#L252-L253
May I suggest, before doing the actual implementation in
neqo-transport/src/cc/to land a metric counting the number of spurious re-transmits we see in the wild? We can then record theStats::spurious_re_transmit / Stats::lostratio in Glean.
If I read the code correctly we already have Stats::late_ack, which is basically counting spurious re-transmits.
https://github.com/mozilla/neqo/blob/b3d8f0d21db5657ebafb14f9b60c8892d2eb3aa9/neqo-transport/src/stats.rs#L254-L255
https://github.com/mozilla/neqo/blob/b3d8f0d21db5657ebafb14f9b60c8892d2eb3aa9/neqo-transport/src/recovery/mod.rs#L267-L272
So it should be easy to record the above mentioned Stats::late_ack / Stats::lost ratio into Glean (though I'll have to look into how that works exactly, I'm sure I'll easily find an example PR).
Before doing that, what kinda ratio are we looking for here for implementing spurious fast re-transmits to be worth it? For me at least I don't really know what to expect from that metric. Are we just looking for an indicator that late acks are happening at all?
Is it correct that all stats are already exposed to neqo_glue and all that's needed to record something to glean is to add a probe and record it in record_stats_in_glean (searchfox)?
Looking at this phabricator revision where @mxinden introduced ECN metrics (and the Glean exposure in general).
@mxinden suggested it might make sense to instead of late_ack / lost track late_ack / congestion_event as multiple lost packets could be part of the same congestion event (the first triggering it, the others not leading to another decrease because we are in State::Recovery).
I think that's more accurate, but we'd have the same problem with late_ack, too. We could have multiple late acks belonging to the same spurious congestion event, but each counting towards the metric.
If we want the metric to be really accurate we would have to have something like spurious_congestion_event / congestion_event where spurious_congestion_event only counts one late_ack per congestion event and congestion_event only counts one lost packet per congestion event. That would basically give us the ratio of cwnd reduces that could've been avoided/restored.
For the second part it's pretty easy, we can just count calls to reduce_cwnd in on_congestion_event.
The first part is a bit more tricky, maybe we could add a flag spurious_event_counted and check for pkt.lost() && !self.spurious_event_counted here to make sure we only count one ack per congestion event.
https://github.com/mozilla/neqo/blob/b3d8f0d21db5657ebafb14f9b60c8892d2eb3aa9/neqo-transport/src/cc/classic_cc.rs#L187-L216
We should also track that we properly declare and end congestion events. There are some older tickets from @mb that indicate we might not always.
Is it correct that all stats are already exposed to
neqo_glueand all that's needed to record something to glean is to add a probe and record it inrecord_stats_in_glean(searchfox)?
Yes. neqo_glue has access to neqo_transport::stats::Stats:
https://github.com/mozilla/neqo/blob/b3d8f0d21db5657ebafb14f9b60c8892d2eb3aa9/neqo-transport/src/stats.rs#L236-L238
The first part is a bit more tricky, maybe we could add a flag
spurious_event_countedand check forpkt.lost() && !self.spurious_event_countedhere to make sure we only count one ack per congestion event.
Sounds good. Minor suggestion, consider adding a spurious_event_encountered: bool to the following two enum variants:
https://github.com/mozilla/neqo/blob/b3d8f0d21db5657ebafb14f9b60c8892d2eb3aa9/neqo-transport/src/cc/classic_cc.rs#L37-L42
@mxinden
Sounds good. Minor suggestion, consider adding a spurious_event_encountered: bool to the following two enum variants:
Why those two specifically? We can also encounter a spurious event while in congestion avoidance:
1. Packet 1 is lost -> RecoveryStart
2. Packet 2 is sent -> Recovery
3. Packet 2 is acked -> CongestionAvoidance
... number of packets can be sent and acked ...
4. Packet 1 is acked -> Spurious ccngestion event while in CongestionAvoidance
Thus I think the best way to track if we already encountered a spurious congestion event in the current epoch is by adding a boolean flag ClassicCongestionControl::had_spurious_congestion_event.
@larseggert
We should also track that we properly declare and end congestion events. There are some older tickets from @mb that indicate we might not always.
I'm guessing for you declaring a congestion event means entering RecoveryStart and ending a congestion event means exiting Recovery / entering the next CongestionAvoidance, correct?
I've briefly looked at the tickets (specifically #1472 and #1473) and I think it'd make sense to properly look at them again at some point. Manuel sits next to me in the office, so I'll chat when I get a chance.
I don't think this necessarily belongs to this issue or the following PR though, right?
For what it is worth, same effort on the Quinn project shows promising numbers:
https://github.com/quinn-rs/quinn/pull/2289
Suggestion by @kazuho at IETF 124:
Instead of recovering from a spurious re-transmit on detection, consider increasing packet threshold and/or kTimeThreshold.
https://github.com/mozilla/neqo/blob/50612f06c4a685be567f25cc48f901b63c2b7c30/neqo-transport/src/recovery/mod.rs#L37
https://github.com/mozilla/neqo/blob/50612f06c4a685be567f25cc48f901b63c2b7c30/neqo-transport/src/rtt.rs#L169
On connections with repeated high packet re-ordering, would prevent the congestion event in the first place.
Instead of recovering from a spurious re-transmit on detection, consider increasing packet threshold and/or kTimeThreshold. On connections with repeated high packet re-ordering, would prevent the congestion event in the first place.
Yes, we do not have empirical data to show, however our experience has been that reducing sensitivity to reorders works enough as a fix.