async-http-client RFC: design suggestion: Make this a "3-tier library"

At the moment, we have a pretty gnarly issue with AHC's API: Settings can either be supplied globally or per request. So if you want to set certain configuration for a certain component in your app (say TLS config, redirect config, ...) then you either have to specify it for every single request (tedious) or globally (expensive as you'll get a new connection pool, need to maintain lifecycle etc).

What I suggest here is making AHC a "3-tier library":

tier 1: What is currently HTTPClient: the highly stateful, expensive, and lifecycle-managed object that owns the TLSContext, the connection pool, and if not .shared also the ELG
tier 2: (doesn't exist right now) A very lightweight (probably struct) and not lifecycle managed client object which is essentially a reference to the tier 1 object and a bag of configuration
tier 3: the individual requests

An API sketch could look like this:

// tier 1
let http = HTTPService(eventLoopGroupProvider: .shared(myELG)) // This is currently called `HTTPClient`
defer {
    try! http.syncShutdown()
}

// tier 2 (creating this is ~free, doesn't need to be shut down)
let httpClient = http.client(tlsConfiguration: specialConfig, redirectConfig: ...)

// tier 3
let result = httpClient.get("https://...").wait()

Benefits

Allows wider sharing of Tier 1 because it doesn't take configuration
Makes the API nicer to use because we can supply configuration to Tier 2 so we don't need to pass it to every request
Would allow Tier 2 to grow a "middleware" concept which would allow doing things like authenticating requests or controlling redirects etc without hard-wiring all that into the core HTTP client.

Important requirements per tier

Tier 1

do not take any configuration apart from the ELG

Tier 2

doesn't need lifecycle management
holds bags of configuration
~free to create

Tier 3

can override most(all?) of the configuration set in Tier 2

Jul 07 '21 13:07 weissi

I agree with most of the things you mention. And I'm +1 on the general proposal. However I don't get why we should be able to override most settings in tier3. If tier2 is basically free to create, I don't see any benefit by overriding the settings. In my point of view the client should be a struct with value semantics.

Jul 07 '21 13:07 fabianfett

@fabianfett good point! I don't know either :D

Jul 07 '21 13:07 weissi

Different TLS configurations interact somewhat poorly with the connection pool, as they produce new keys. So this isn't entirely a panacea for the connection pool re-use. Otherwise agreed.

Jul 07 '21 13:07 Lukasa

Different TLS configurations interact somewhat poorly with the connection pool, as they produce new keys. So this isn't entirely a panacea for the connection pool re-use. Otherwise agreed.

Agreed, but we already have that issue/feature today. And we added it because people very commonly requested it, usually because the need a different trust roots/client certs to talk to different services.

Jul 07 '21 13:07 weissi

And I would suggest a rename HTTPService is in my point of view a connection pool manager (and all connection pool settings need to be defined on this level). Since HTTPConnectionPoolManager is a mouthful and most users won't care about the exact semantics, I think I would prefer to call this HTTPConnectionPool.

However this leads us to the point where we have a HTTPConnectionPool which is actually a ConnectionPoolManager. Naming is hard. What are your thoughts?

Jul 07 '21 13:07 fabianfett

And I would suggest a rename HTTPService is in my point of view a connection pool manager (and all connection pool settings need to be defined on this level). Since HTTPConnectionPoolManager is a mouthful and most users won't care about the exact semantics, I think I would prefer to call this HTTPConnectionPool.

I don't like giving it such a specific name because it can be more than a connection pool. And also, most users have no idea what a connection pool is.

Jul 07 '21 13:07 weissi

also CC @adam-fowler , I think https://github.com/soto-project/soto has a similar-ish design?

Jul 07 '21 13:07 weissi

Soto has two tiers, in that you have

AWSClient where every public function takes a AWSServiceConfig parameter and then
Every service object (S3, DynamoDB etc) has an instance of a AWSServiceConfig which it passes to AWSClient when it calls AWSClient methods.

The AWSServiceConfig is used to define endpoints, protocol used, server region, signing name etc).

In theory you could call the AWSClient public functions directly but in reality nobody does and will always do it via a service object. This design came from @fabianfett. Previously in Soto v4 we had one object which combined the service and client.

I guess there are similarities to what is described above in that AWSClient would be your tier 1 object and the service objects would be your tier 2 objects. In situations where I have had config options I wanted to easily change on a per call basis (what appears to be your tier 3) I have added a .with function which creates a copy of the service object with edited attributes. For instance

let s3 = S3(client: awsClient, region: .euwest1)
s3.getObject(bucket: "my-bucket", key: "my-file")
s3.with(options: .s3UseDualStackEndpoint).getObject(bucket: "my-bucket", key: "my-file")
s3.with(timeout: .minutes(15)).getObject(bucket: "my-bucket", key: "my-file")

Jul 07 '21 15:07 adam-fowler

More previous art… Go‘s http client has three levels as well:

Transport - which is an equivalent to a connection pool
Client
Request

A client can be created with a transport, but it doesn’t require a transport.

Aug 11 '21 21:08 fabianfett

@fabianfett Thanks! I kinda like HTTPTransport, HTTPClient, HTTPRequest. Don't think "transport" really makes sense but it's not too terrible either :).

Aug 11 '21 21:08 weissi

@fabianfett / @Lukasa / @adam-fowler I think we need to make progress on this issue. For a user of AsyncHTTPClient I see the following problem over and over again:

Let's say I have a library that internally does some HTTP. Currently, I have two options for it:

Have the library internally create & own a HTTPClient instance which it'll share with nothing else. That forces the library's API to have full-blown lifecycle management (so it can shut down the HTTPClient). Also anything TLS/proxy config related becomes a pain (because now the library will need to accept a lot of config that it can then pass down to HTTPClient on creation).
Accept a HTTPClient instance to share. From a resource perspective that's nice because the library can now share the HTTPClient with others. Proxy & TLS config also becomes easier because the user can pre-configure the HTTPClient to do the right thing. Additionally, the library might now get away w/o lifecycle management (because it doesn't own the HTTPClient, it just uses it). But unfortunately now I can't control anything w.r.t. timeouts or redirect configuration (because that's a global config on HTTPClient). So it's actually possible for the library to not work because say it needs to follow redirects or the connect timeout is too short or something else.

Both options feel bad but I run into them all the time :(.

Jun 01 '22 09:06 weissi

async-http-client async-http-client copied to clipboard

RFC: design suggestion: Make this a "3-tier library"

Benefits

Important requirements per tier

Tier 1

Tier 2

Tier 3

async-http-client
async-http-client copied to clipboard