s2n-quic icon indicating copy to clipboard operation
s2n-quic copied to clipboard

Tracking issue: ACK frequency optimizations

Open toidiu opened this issue 3 years ago • 0 comments

Testing showed that Ack delay on the peer yields marginal gains, so more concrete test data is needed prior to implementing this feature.


As proposed originally in https://datatracker.ietf.org/doc/html/draft-ietf-quic-ack-frequency, reducing the frequency of ACK processing can yield CPU savings.

Concerns

There are two main concerns of delaying ACKS in QUIC since recovery/loss-detection relies on the the ACK signal:

  • 'low freq' will delay packet recovery mechanisms (loss)
  • 'low freq' can result in an inaccurate congestion control (ECN, BBR)

Since these mechanisms rely on a constant ACK signal, production testing is required to help with tune the ACK delay:

  • what is the minimum usable ACK frequency (needs to be variable, based on network characteristics) for CC algorithms (BBR) which rely on ACKs to create a model of the network (bandwidth_freq: In the most general case it requires high-precision (microsecond-granularity or better) timestamps on the sender).
  • what is the minimum ACK frequency (needs to be variable, based on network characteristics) which is needed to ensure quick loss recovery.

Solutions

Testing showed that Ack delay on the peer yields marginal gains so more specific testing(more concrete usecase) needs to be done prior to implementing any of the features below. Single round batching did not yield performance gains.

Here is a summary of the 3 proposed solutions:

  • Batch ACK Processing (single round) Testing showed that this failed to yield performance benefits ~(MVP, local receive optimized, low downsides, low effort) https://github.com/aws/s2n-quic/issues/1277~
  • Delay Receive ACK Processing (multi-round) (local receive optimized, complicated, medium/high effort)
  • Impl delayed_freq_rfc (peer dependent, receive/send optimized, complicated, medium/high effort)

Batch ACK Processing (single round) MVP

Currently s2n-quic processes acks from each packet individually. This solution proposes that ACK information be batched across all packets in a single round and only be processed once.

  • pros:
    • ack processing consumes ~20% CPU
      • FAILED to show improvement: CPU usage went up https://github.com/aws/s2n-quic/pull/1298
    • bytes/instruction (150k → 125k)
    • syscalls savings (20k → 6k)
    • This solution does not affect LR and CC!!
  • cons:
    • possibly overly simplistic and leaves gains on the table

Delay Receive ACK Processing (multi-round)

This solution is equivalent to applying the delayed ack RFC locally on the receive side by artificially aggregating the ack information and consuming it in a delayed manner.

  • pro:
    • Allow for batching ACKs based on time/packet thresholds
    • Independent of peer and therefore more predictable
    • A stepping stone towards impl delayed_freq_rfc
  • con:
    • Delaying the ACK signal that the local CC and LR components rely on and could result in degraded performance
    • Will require data and experimentation to fine-tune

Impl delayed_freq_rfc

This is the only solution which allows an endpoint to influence the peer behavior. By relaxing the peer's ack frequency requirements, it is possible to improve the peer's throughput, spend less time per connection and serve more data.

  • pro:
    • Configurable thresholds (time/packet) means its possible to influence/relax peer requirements.
    • Escape hatch with the Immediate ACK mechanism
    • Configurable ECN and out-of-order packet behavior (also something we can tweak in above solution)
    • Allows for setting a min_ack_delay value
    • Can influence the peer ACK rate, which means less processing on the receive side
  • con:
    • Requires the peer to support/negotiate/use the extension
    • Could require complicated logic to compute the optimal ACK thresholds
    • Delaying the ACK signal that the local CC and LR components rely on and could result in degraded performance
    • Will require data and experimentation to fine-tune

toidiu avatar Apr 26 '22 17:04 toidiu