aws-sdk-rust icon indicating copy to clipboard operation
aws-sdk-rust copied to clipboard

Make SO_NODELAY configurable (or default to true) as it massively reduces Lambda invocation latency

Open jackkleeman opened this issue 1 year ago • 4 comments

Describe the feature

The h2 library that backs this SDK seems to put HEADERS and DATA frames into separate TCP packets when payloads are large. That's fine, but in combination with Nagle's algorithm, it means that the second packet containing the DATA frame will not leave the client machine until an ACK is received for the packet containing the HEADERS frame. In my tests this appears to add substantial latency to Lambda invocations. Perhaps on the AWS side there is tcp delayed acknowledgement going on?

This can be resolved by setting SO_NODELAY (this can be set via a method on the hyper HttpConnector). In general this flag reduces latency at the expense of potentially more packets, but in this case it seems to make no difference to the number of packets, but massively reduces latency. In Go this flag defaults to true.

Benchmarks against a hello world Lambda

delays/delay            time:   [90.039 ms 91.792 ms 93.627 ms]
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low severe
  2 (2.00%) high mild
  4 (4.00%) high severe
delays/nodelay          time:   [33.833 ms 34.452 ms 35.147 ms]
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

Use Case

Reducing latency for AWS API calls

Proposed Solution

Either a new config variable, or simply default it to true.

Other Information

As a workaround, you can do this:

static HTTPS_NATIVE_ROOTS: Lazy<HttpsConnector<HttpConnector>> = Lazy::new(|| {
    let mut http = HttpConnector::new();
    // HttpConnector won't enforce scheme, but HttpsConnector will
    http.enforce_http(false);
    // Set SO_NODELAY, which we have found significantly improves Lambda invocation latency
    http.set_nodelay(true);
    hyper_rustls::HttpsConnectorBuilder::new()
        .with_tls_config(
            rustls::ClientConfig::builder()
                .with_cipher_suites(&[
                    // TLS1.3 suites
                    rustls::cipher_suite::TLS13_AES_256_GCM_SHA384,
                    rustls::cipher_suite::TLS13_AES_128_GCM_SHA256,
                    // TLS1.2 suites
                    rustls::cipher_suite::TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
                    rustls::cipher_suite::TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
                    rustls::cipher_suite::TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
                    rustls::cipher_suite::TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
                    rustls::cipher_suite::TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256,
                ])
                .with_safe_default_kx_groups()
                .with_safe_default_protocol_versions()
                .expect("Error with the TLS configuration. Please file a bug report under https://github.com/restatedev/restate/issues.")
                .with_native_roots()
                .with_no_client_auth()
        )
        .https_or_http()
        .enable_http1()
        .enable_http2()
        .wrap_connector(http)
});

let mut config = aws_config::defaults(BehaviorVersion::latest());
config = config.http_client(HyperClientBuilder::new().build(HTTPS_NATIVE_ROOTS.clone()));

Acknowledgements

  • [X] I may be able to implement this feature request
  • [ ] This feature might incur a breaking change

A note for the community

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue, please leave a comment

jackkleeman avatar Nov 16 '23 18:11 jackkleeman

Hey @jackkleeman, thanks for submitting this issue. We'll add it to our backlog.

I looked at this and came up with a few findings:

  • SO_NODELAY and TCP_NODELAY are two names for the same thing.
  • We should add a new nodelay setting to the HttpConnectorSettings and HttpConnectorSettingsBuilder structs.
  • I'm thinking that this value would be passed to the default hyper connector in HyperConnectorBuilder::build.

Currently, adding support won't be simple because (afaict) we don't have easy access to the hyper::client::connect::HttpConnector used in our TLS connector meaning we can't just call HttpConnector::set_nodelay.

One last thing to note: in hyper v1.0, HttpConnector got moved to hyper-util.

Velfi avatar Nov 20 '23 21:11 Velfi

Yes, given that the Https connector is currently in a lazy static and then just cloned, the only easy thing to do here is to change the default, which I guess is a big decision

As a workaround, users can produce their own https connector by copying and pasting the defaults used by aws, and provide it at config time. Thats how I have fixed it on my end

jackkleeman avatar Nov 21 '23 10:11 jackkleeman

if you have a snippet could you paste it on this issue? That will be very helpful for other folks looking for the same behavior

rcoh avatar Nov 22 '23 14:11 rcoh

Done @rcoh

jackkleeman avatar Nov 22 '23 22:11 jackkleeman