aws-sdk-java-v2 icon indicating copy to clipboard operation
aws-sdk-java-v2 copied to clipboard

TLS setup as part of channel acquire

Open dagnir opened this issue 3 years ago • 3 comments

Motivation and Context

Fix confusing error behavior where acquisition completes, but then channel appears to be closed as request is made because TLS setup did not complete successfully.

Modifications

This commit correctly factors in the TLS negotiation into the channel pooling acquire logic; this ensures that when the Netty client successfully acquires a connection from the channel pool, TLS negotiation is guaranteed to have completed successfully.

Testing

  • New unit tests
  • Running integ tests

Screenshots (if appropriate)

Types of changes

  • [x] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)

Checklist

  • [x] I have read the CONTRIBUTING document
  • [x] Local run of mvn install succeeds
  • [x] My code follows the code style of this project
  • [x] My change requires a change to the Javadoc documentation
  • [x] I have updated the Javadoc documentation accordingly
  • [x] I have added tests to cover my changes
  • [x] All new and existing tests passed
  • [x] I have added a changelog entry. Adding a new entry must be accomplished by running the scripts/new-change script and following the instructions. Commit the new file created by the script in .changes/next-release with your changes.
  • [ ] My change is to implement 1.11 parity feature and I have updated LaunchChangelog

License

  • [x] I confirm that this pull request can be released under the Apache 2 license

dagnir avatar Feb 24 '22 00:02 dagnir

Can we run benchmarks to see how much performance impact this change would cause?

zoewangg avatar Mar 10 '22 23:03 zoewangg

Can we run benchmarks to see how much performance impact this change would cause?

Did a test with the following code

Test code

    public static void main(String[] args) throws InterruptedException {
        CloudWatchMetricPublisher metricPublisher = CloudWatchMetricPublisher.builder()
                .namespace("DongieTlsTesting")
                .cloudWatchClient(CloudWatchAsyncClient.builder()
                        .region(Region.US_WEST_2)
                        .credentialsProvider(EnvironmentVariableCredentialsProvider.create())
                        .build())
                .build();

        int nThreads = Runtime.getRuntime().availableProcessors() * 2;

        ExecutorService exec = Executors.newFixedThreadPool(nThreads);

        System.out.printf("Spinning up %d threads%n", nThreads);
        for (int i = 0; i < nThreads; ++i) {
            exec.submit(() -> {
                while (true) {
                    S3AsyncClient s3 = null;
                    try {
                        // Always create a new client to defeat pooling
                         s3 = S3AsyncClient.builder()
                                .region(Region.US_WEST_2)
                                .credentialsProvider(EnvironmentVariableCredentialsProvider.create())
                                .overrideConfiguration(o -> o.addMetricPublisher(metricPublisher))
                                .build();

                        s3.getObject(r -> r.bucket(BUCKET).key(KEY), AsyncResponseTransformer.toBytes()).join();
                    } catch (Exception e) {
                        System.out.printf("ERROR [%s]: %s%n", Thread.currentThread().getName(), e.getMessage());
                    } finally {
                        if (s3 != null) {
                            s3.close();
                        }
                    }
                }
            });
        }

        Thread.sleep(Duration.ofMinutes(15).toMillis());

        System.out.println("Done.");
        exec.shutdown();

Predictably, the ConcurrencyAcquireDuration is higher, 13-14ms vs 7ms, since the complete handshake is being factored in. Interestingly though, both the ServiceCallDuration and ApiCallDuration are lower overall, probably since previously they were including at least part of the TLS handshake:

Screen Shot 2022-03-25 at 3 51 18 PM

I'm also pulling this change in internally to test in the canaries.

dagnir avatar Mar 25 '22 22:03 dagnir

Closing this for now because of performance regression concerns.

dagnir avatar Sep 29 '22 20:09 dagnir