neqo
neqo copied to clipboard
Implement RACK variation for QUIC
Highly WIP: Early Draft, already working, but not necessary final version. Not necessary to read through yet, only if you are interested. Submitting already, because I probably only continue next week.
This is a modified version of RACK for QUIC. The high level is working, but the algorithm differs a bit from the RFC due to more information available in QUIC than TCP.
What it does:
- when packet reordering is detected: adjust the
reo_wnd_multto high enough to not detect the same amount of packet reordering as packet loss next time - after a loss was detected reduce when entering
CongestionAvoidancereducereo_wnd_persistby one. and setreo_wnd_multto the next non-zero entry ofreo_wnd_mult.
I am still considering to make the implementation "dumper" and maybe simpler/more readable by following the RFC more closely.
reo_wnd_mult += 1should only be called once per RTT -> need state for that
Still TODO:
- proper variables names (not those from the RFC)
- not sure if returning
boolfromon_packets_ackedis fine - debug prints instead of
printlnfor print statements that should stay
Differences to TCP RACK (of next uploaded version):
- initial timeout at
9/8 RTTinstead of5/4 RTTto keep the - reorder_window_mult adds fractions of
1/8 RTTinstead of1/4 RTTfor no particular reason (except that it make the code more consistent to the initial timeout of9/8 RTT - out of order packets, that weren't causing spurious retransmits still reset the
reorder_window_persistto 16. TCP doesn't have the necessary information to do this reorder_window_multis set high enough to prevent a spurious retransmit next time instead of just increasing by one. Can be done, because we have more RTT estimates for packets, that would have been spuriously retransmitted in TCP.
Tagging @larseggert for thoughts. (Looks like there is a rebase to manage here.)
RACK and/or other CC improvements are attractive areas of work, but I think we first need to be sure that the underlying machinery (timers, buffers, loss recovery logic, etc.) is working as intended, and that is probably easiest with a standard RFC9002 implementation. We also need a robust performance testing setup, because otherwise we won't really be able to quantify the improvements that proposed changes like this one will bring.
In other words, I think we should put this and similar improvements on the back burner and first get the plumbing fixed and performance scaffolding done.