s3fs-nio icon indicating copy to clipboard operation
s3fs-nio copied to clipboard

Analyze options to use SdkHttpClient implementations

Open ptirador opened this issue 5 years ago • 5 comments

Task Description

The S3Factory class manages the build of a new Amazon S3 instance, which right now it's using an Apache HTTP Client.

As specified in this Pull Request discussion, this is locking in customers to the ApacheHttpClient, which adds a dependency they may not want. It's needed to provide an option for other SdkHttpClient implementations.

The UrlConnectionHttpClient is fairly popular choice in Java-based Lambda functions as it has faster startup time, so less impact to cold starts.

Tasks

The following tasks will need to be carried out:

  • [ ] Analyze options to use SdkHttpClient implementations

Task Relationships

This task:

  • Is a follow-up of: #63

Useful Links

Help

  • Our chat channel
  • Points of contact:
    • @carlspring
    • @steve-todorov

ptirador avatar Oct 31 '20 16:10 ptirador

Pros

Use the built-in HttpUrlConnection client to reduce instantiation time

The AWS Java SDK 2.x includes a pluggable HTTP layer that allows customers to switch to different HTTP implementations. Three HTTP clients are supported out-of-the-box:

  • Apache HTTP client
  • Netty HTTP client
  • Java HTTP URL Connection client.

With the default configuration, Apache HTTP client and Netty HTTP client are used for synchronous clients and asynchronous clients respectively. They are powerful HTTP clients with more features. However, they come at the cost of higher instantiation time.

On the other hand, the JDK built-in HTTPUrlConnection library:

  • Is more lightweight and has lower instantiation time.
  • As is part of the JDK, using it will not bring in external dependencies. It will allow you to keep the deployment package size small and thus, reduce the amount of time it takes for the deployment package to be unpacked and downloaded.

Hence, it's recommended using HttpUrlConnectionClient when configuring the SDK client. Note that it only supports synchronous API calls. If we'd like to see support for asynchronous SDK clients with JDK 11 built-in HTTP client, please upvote this GitHub issue.

Exclude unused SDK HTTP dependencies

The SDK by default includes Apache HTTP client and Netty HTTP client dependencies. If startup time is important to your application and you do not need both implementations, it's recommended excluding unused SDK HTTP dependencies to minimize the deployment package size. Below is the sample Maven POM file for an application that only uses url-connection-client and excludes netty-nio-client and apache-client.

    <dependencies>
        <dependency>
            <groupId>software.amazon.awssdk</groupId>
            <artifactId>s3</artifactId>
            <exclusions>
                <exclusion>
                    <groupId>software.amazon.awssdk</groupId>
                    <artifactId>netty-nio-client</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>software.amazon.awssdk</groupId>
                    <artifactId>apache-client</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>software.amazon.awssdk</groupId>
            <artifactId>url-connection-client</artifactId>
        </dependency>
    </dependencies>

Cons

Incoveniences of using the built-in HttpUrlConnection client

As the JDK built-in HTTPUrlConnection client is more lightweight, its configuration is simpler. If compared to Apache HTTP Client, for example, you cannot configure:

  • the maximum number of connections, which would be useful in environments where you may want to share a single connection pool among multiple AWS services.
  • an HTTP/HTTPS proxy connection.

FYI @carlspring @steve-todorov

ptirador avatar Jan 23 '21 12:01 ptirador

Hi @ptirador ,

Thanks for your investigation!

What do you mean by "deployment package"?

In my opinion, we need to have support for both synchronous and asynchronous requests. If the we need the Apache + Netty dependencies for this, then so be it. There are many other things that you can't do with the HTTPUrlConnection like setting up connection pools and so on, (if I recall correctly).

How much of a difference is there in terms of instantiation time?

And the other question -- are we using async requests for anything right now? What use cases would we have for this?

My only concern is that, at the moment, we claim to support JDK11 (which is, of course indeed the case), and, whatever we decide will have to make sure this does not break out JDK 11 support.

Which one is your advice and personal preference?

carlspring avatar Jan 23 '21 21:01 carlspring

Thanks @ptirador for raising this issue and making the initial research!

How did you come to the conclusion using the built-in HttpUrlConnection client is faster? Did you do a JMS benchmark which backs this statement with data?

Honestly, if I had to pick one of the three options above - I'd go with netty-nio-client and async connections as the default option. In my experience, using netty and proper async implementation would result in much better throughput and overall performance than using blocking / sync approach. Also, if you're already using Cassandra or something similar the chances you are already using netty are very big.

If you are up for the task - we can create a JMS benchmark which tests the different implementations so we can make a decision based on the data.

steve-todorov avatar Jan 25 '21 12:01 steve-todorov

Hi @carlspring @steve-todorov,

The conclusions that I wrote are based on this article, which talks about these instantiation times but without providing any benchmarch example. We can create this JMS benchmark to test them.

In my opinion, I will also go with Netty and async connections, specially because of the overall performance boost that it provides. Also, a few months ago we switched the NIO implementation to use AsynchronousFileChannel instead of FileChannel, so I think it could be the best way to go.

ptirador avatar Jan 25 '21 19:01 ptirador

Hi @ptirador ,

I believe you and @steve-todorov are right -- we should use Netty, since indeed we did switch to AsynchronousFileChannel, as you've just reminded me.

How much of an effort will this task be?

carlspring avatar Jan 25 '21 19:01 carlspring